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Abstract 

We study a standard linear readout model of perceptual integration from a population of sensory 
neurons. We show that the readout can be associated to a set of characteristic equations which 
summarize the joint trial-to-trial covariance structure of neural activities and animal percept. These 
characteristic equations implicitly determine the readout parameters that were used by the animal 
to create its percept. In particular, they implicitly constrain the temporal integration window w and 
the typical number of neurons K which give rise to the percept. Comparing neural and behavioral 
sensitivity alone cannot disentangle these two sources of perceptual integration, so the characteristic 
equations also involve a measure of choice signals, like those assessed by the classic experimental 
measure of choice probabilities. We then propose a statistical method of analysis which allows to 
recover the typical scales of integration w and K from finite numbers of recorded neurons and record- 
ing trials, and show the efficiency of this method on an artificial encoding network. We also study 
the statistical method theoretically, and relate its laws of convergence to the underlying structure of 
neural activity in the population, as described through its singular value decomposition. Altogether, 
our method provides the first thorough interpretation of feedforward percept formation from a pop- 
ulation of sensory neurons. It can readily be applied to experimental recordings in classic sensory 
decision-making tasks, and hopefully provide new insights into the nature of perceptual integration. 



1 Introduction 



Most cortical neurons are noisy, or at least appear so to experimenters. When a sensory neuron's spikes 
are recorded in response to a well-controlled stimulus, they will show a large variability from trial to trial. 
This noisiness has been acknowledged from early on, as a nuisance preventing experimenters from easy 
access to the encoding properties of sensory neurons. But what is the impact of trial-to-trial sensory noise 
on the organism itself? This question gained renewed interest a few decades ago, with the generalization 



of experimental setups recording neural activity from awake, behaving animals (Mountcastle et al. 1990 



Britten et al. 1992 ). In these setups, animals are presented with a set of stimuli / and trained to respond 
differentially to different values of /, thus providing an (indirect) report of their percept of /. As neural 
activity and animal behavior are simultaneously monitored, it becomes possible to seek a causal link 
between the two. 

In such setups, one particular hypothesis — which we refer to as the "sensory noise" hypothesis — has 
proven instrumental in linking neural activity and percepts. It postulates that trial-to-trial noise at the 
level of sensory neurons is the main factor limiting the accuracy of the animal's perceptual judgements 
(Werner and Mountcastle 1965 Talbot et al. 1968). Indeed, signal detection theory provides the ade- 
quate tools to estimate such accuracies. Any type of biological response to a stimulus / — say r(/) — can 
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be associated to a signal-to-noise ratio (SNR) , which measures how typical variations in r due to a change 
of stimulus / (the signal) compare to intrinsic variations of r from trial to trial (the noise). When r[f) 
measures the response of a neuron to stimulus /, the resulting SNR is often called the neurometric 
sensitivity for that particular neuron. Alternatively, r{f) may also be the response of the animal itself 
to stimulus /. The resulting SNR is called the animal's psychometric sensitivity, which quantifies the 
animal's ability to discriminate nearby stimulus values /. Reformulated in terms of SNRs, the "sensory 
noise" hypothesis states that neurometric sensitivity, computed from the population of sensory neurons 
under survey, is equal to the psychometric sensitivity for the animal in the task. 

Applying this idea, neurometric and psychometric sensitivities have often been computed and com- 
pared, in various sensory systems and behavioral tasks (see, e.g., Romo and Salinas 2003 Gold and 



Shadlen 2007 for reference). However, it was progressively realized that most of these comparisons bear 
no simple interpretation, because the neurometric sensitivity is not a fixed quantity: it depends on how 
information is read out from the neurons. For example, if the various sensory neurons in the population 
behave independently one from another, then the overall SNR from the population will essentially be 
the sum of individual SNRs and thus, the experimenter's estimate of neurometric sensitivity will depend 
on how many neurons — say K — they included in their analysis. This intuition still holds in realistic 
populations where neurons are not independent, with the additional complexity that the evolution of 



neural SNR with K is very influenced by the correlation structure of noise in the population (Shadlen 



and Newsome 1998 Abbott and Dayan 1999 Averbeck et al. 2006). 



More subtly, another parameter has a direct influence on estimated neurometric SNRs: the time scale 



w used to integrate each neuron's spike train, to describe the neuron's activity over the trial ( Cohen and 



Newsome 2009 1 . Indeed, through the central limit theorem, the more neural spikes are integrated into 
the readout, the more accurate that readout will be. Adding extra neurons through K, or extra spikes 
for each neuron through w, will thus have the same type of impact on the readout's overall SNR. In fact, 
if all neurons from the population are identical, independent Poisson encoders, one can easily show that 
the readout's overall SNR scales with y/wK, emphasizing the duality between K and w. 

As there is no unique way of reading out information from a population of sensory neurons, a question 
naturally arises: what type of readout does the organism use? For example, how many sensory neurons 

and what typical integration time scale w, provide a relevant description of the animal's percept 
formation? The "sensory noise" hypothesis can precisely be used to answer this question: the 'true' 
neuronal readout for the organism must be the one providing the best account of animal behavior. 
However, the previous K-w discussion clearly shows that comparing neurometric SNR to psychometric 
SNR is not sufficient to target the true readout: there will be several combinations of K and w leading 
to the same overall neurometric SNR, while corresponding to very different extraction strategies by the 
animal. Thus, an additional experimental measure is required to recover the typical scales of integration 
of the true readout. 

Choice signals are a good candidate for this additional measure. In two-alternative tasks, where the 
animal must report a binary discrimination of stimulus value (say, / > or / < 0), choice signals are 



generally computed in the form of choice probabilities (CP) (Green and Swets 1966[ Britten et al. 1996) 



CP is computed for each recorded neuron individually, and quantifies the trial-to-trial correlation between 
the activity of that neuron and the animal's ultimate (binary) choice on the trial, all other features being 
held constant. In particular, since CP is computed across trials with the same stimulus value (generally 
uninformative, i.e., / = 0), the observed correlations cannot reflect the influence of stimulus on neural 
activity and animal choice. Instead, a signiflcant CP can only result from the process by which the 
neuron's activity influences — or is influenced by — the animal's forming perceptual decision. 

It is intuitively clear that CPs reveal something about the way information is extracted from sensory 
neurons. For example, if the animal's percept is built from a single neuron, then that neuron will have a 
very large CP, because its activity on every trial directly predicts the animal's percept. Instead, if several 
independent neurons contribute to form the animal's percept, then they are all expected to have low 
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CP value, as the activity of each neuron has only a marginal impact on the animal's decision. However, 
converting this intuition about choice signals into a quantitative interpretation was long hampered by the 
fact that, just like neurometric SNR, CP values are largely influenced by the population's noise covariance 
structure. For example, a neuron may not be utilized by the animal to form its percept, and yet display 
significant CP because its activity is correlated with that of another neuron being utilized. As a result, 
early studies relating CP values to the animal's perceptual readout only relied on numerical simulations 
(IShadlen et al.l 119961 ICohen and Newsomel 120091) , assuming very specific noise correlation structures 



that weakened the generalizations of their results. Only very recently have Haefner et al. (2013) provided 
an analytical expression for CP values in the presence of noise correlations (see section 4.2), opening the 
door to general, quantitative interpretations of choice probabilities. 

In this article, we show how the combined information of animal sensitivity (SNR) and choice signals 
allows to estimate the typical scales of percept formation by the animal, both across neurons (number of 
neurons involved K) and in time (integration window w). Our results apply in the standard feedforward 
model of percept formation, and can be derived for any noise covariance structure in the neural population. 
We first show how the joint covariance structure of neural activities and animal percept leads to a set 
of characteristic equations for the readout, which implicitly determine the animal's perceptual readout 
policy across neurons and time. Then, we show how these characteristic equations can be used in a 
statistical form, across the ensemble of trials and neurons available to the experimenter, to determine 
the typical scales K and w of percept formation from the activity of sensory neurons. This approach 
is mandatory since experimental measurements can only provide statistical samples of the full neural 
population. Using an artificial neural network to provide sensory encoding, we show that our method 
can reliably recover the true scales of perceptual integration, without requiring full measurement of the 
neural population. Thus, our method can readily be applied to real experimental data, and provide new 
insights into the nature of sensory percept formation. 



2 Methods 



2.1 Framework and notations 

We place ourselves in a general framework, describing a typical perceptual decision-making experiment 
(Fig. [T]). On each trial, a different stimulus / is presented to the animal (Fig. [T^, top), which then 
takes a decision according to its internal judgement /* of stimulus value. Our framework assumes that 
this percept /* is directly available to the experimenter on each trial. In real experimental setups, 
the animal's report is generally more indirect — typically a binary choice based on the unknown percept 
/*. We choose the former approach because it applies generically to most perceptual decision-making 
experiments, whereas the "choice" part is more dependent on each particular setup. We detail later how 



both approaches can be reconciled through simple models of the animal's behavior (section 4.2 1 



Simultaneously, experimenters record neural activities from a large population of sensory neurons, 
which is assumed to convey the basic information about / used by the animal to take its decision 
(Fig. [I^, bottom) . Typical examples could be area MT in the context of a moving dot discrimination 
task (e.g., [Britten et al. 1992), area MT or V2 in the context of a depth discrimination task (e.g., Uka 



and DeAngelis |2003[ Nienborg and Cumming 2009[ ), or area SI in the context of a tactile discrimination 



task (e.g., Hernandez et al.( "2000). We describe the activity of this neural population on every trial as a 



point process s{t) = {sj(i)}j=i..jVt„t, where each Si(t) is the spike train for neuron z, viewed as a series of 
Dirac pulses. As an important remark, TVtot denotes the full population size, a very large and unknown 
number. It is not the number of neurons actually recorded by the experimenter, which is generally much 
smaller. 

For simplicity, we assume a fine discrimination task, where the different stimulus values / presented 
to the animal display only moderate variations around a central value, say / = 0. This substantially 
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simplifies SNR computations, because the 'signal' part of any response r{f) is then summarized by its 
slope in / = : dfF,{r\f)^f^Q, where E() denotes the average response over trials. We assume that this 
linearization with / can be performed both for the psychometric report /*, and for individual neuron 
activities. This is mostly a convenience though, and the framework could be generalized to more complex 
dependencies on stimulus /. 

From the raw data of /* and s{t) on each trial, a number of measures are routinely used to describe 
neural activity and animal behavior. First, the psychometric sensitivity Z* describes the animal's accu- 
racy in distinguishing nearby frequency values. It can be computed from the distribution of (/, f*) across 
trials (Fig.[l]D), according to the formula: 

^* = (Var(/*|/))/ 

where notation (.) f denotes an average across stimulus conditions. This is exactly the (squared) SNR 
for random variable f*{f), assuming that the 'signal' term 9/E(/*|/) is equal to 1 because the animal's 
average judgement of / is unbiased (the framework easily generalizes to a biased percept). 

On the other hand, for each recorded neuron, it is common practice to compute its peri-stimulus time 
histogram (PSTH) in response to each different tested stimulus (Fig.[ljl): 

A.(t;/):=E(s,(t)|/), (2) 

where E denotes averaging over trials. Since all stimuli / are assumed to be close one from another, the 
dependency of Xi{t ; /) on / is essentially linear, and can be summarized by the (temporal) tuning curve 
for the neuron (Fig.[lj3): 

m-=dfKit-J). (3) 

Furthermore, as recent techniques allow the simultaneous recording of many neurons, experimenters 
also have access to samples from the trial-to-trial covariance structure in the population (Fig. [l|:) . For 
every pair of neurons (?,j) and instants in time (i, s), this covariance structure is assessed through the 



neurons' joint peri-stimulus time histogram (JPSTH, Aertsen et al. 1989) 



7.,(t,s):=(Cov(s,(t),s,(s)|/))/. (4) 

We only consider the average covariance structure, over different stimuli /. First, as above, nearby values 
of / insure that the covariance structure will remain mostly unchanged. Second, trial-to-trial covariances 
correspond to second-order effects on neural activity, which require several trials to be reliably estimated — 
another reason to lump data from different stimuli / into a single estimate. 

Finally, we can measure a choice signal for each neuron, estimating the trial-to-trial covariance of 
neuron activity Si{t) with the animal's choice (Fig. [ij'). Since in our framework the animal directly 
reports its percept f*, we readily describe the choice signal of each neuron by its percept covariance 
(PCV) curve: 

<(t):=(Cov(5,(t),r|/))/. (5) 

Again, this covariance information is lumped across the different (nearby) stimulus values /, in order 
to improve experimental measurement. The PCV curve captures the core intuition behind the more 
traditional measure of choice probability (CP), while retaining a linear form convenient for analytical 
treatment. Percept covariance curves are not directly measurable in classic experimental setups where 
the animal only reports a binary choice ; however their analytical link to available measures such as CPs 



can be easily derived given simple models of the animal's decision policy (see section 4.2). 

Unlike many characterizations of neural activity that rely only on spike counts, our framework re- 
quires an explicit temporal description of neural activity through PSTHs (eq. [2]), JPSTHs (eq. |4| and 
percept covariance curves (eq. [5| . Indeed, our method ultimately predict when, and how long, perceptual 
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Figure 1. Framework and main experimental measures, (a) Experimental setup. Top: A set of stimulus 
values / (color-coded as blue, yellow, red) are repeatedly presented to an animal, which reports its percept 
/* on each trial (color-coded as green). Bottom: In each session, several task-relevant sensory neurons 
are recorded simultaneously with behavior, (b) Perceptual sensitivity Z* is defined as the square SNR 
of the animal's reports f*{f)- (c) The noise covariance structure can be assessed in each each pair of 
simultaneously recorded neurons, as their joint peri-stimulus histogram (JPSTH). (d) Trial- wise response 
of a particular neuron. Each thin line is the schematical representation of the spike train on each trial. 
Segregating trials according to stimulus (top), we access the neuron's peri-stimulus histogram (PSTH) 
and its tuning curve — shown in panel (e). Segregating trials according to the animal's perceptual error 
/*(/) — / (bottom), we access the neuron's percept covariance (PCV) curve — shown in panel (f). 
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integration takes place in the organism. Readers may feel uncomfortable that the resulting definitions are 
directly expressed over trains of Dirac pulses. While these notations are fully justified in the framework of 
point processes ( Daley and Vere- Jones, 2007| ), they describe idealized quantities that cannot be estimated 
from a finite number of trials, leading to jaggy estimates formed from the collection of Dirac peaks. So 
in practice, spike trains Si(t) are computed in temporal bins of finite precision. 



2.2 The readout model and its characteristic equations 

All experimental measures described above, taken together, provide a full characterization of the joint 
covariance structure of variables (s(t),/*) across stimuli and trials (Fig. [2j;). The key argument to 
exploit these data, which is actually a reformulation of the 'sensory noise' hypothesis, is that the animal's 
percept /* is built on every trial from the activity of the sensory neurons, meaning that /* — F*{s) 
for some unknown readout F* . As a result, each proposed readout F directly yields an estimate for 
the joint covariance structure of {s{t), /*) — through a set of relationships which constitute the readout's 
characteristic equations. Conversely, since this joint covariance structure is experimentally measurable, 
it implicitly constrains the nature of the true readout F* which was applied by the animal. In this 
section, we introduce a generic form of linear readout, stemming from the standard feedforward model of 
perceptual integration, and derive its characteristic equations. We show that in theory, these equations 
totally characterize the readout applied by the animal. 



2.2.1 Readout model 



We define a generic linear readout from the activity of sensory neurons s(t) (Fig. [2p,), based on a given 
readout vector: a — {a''}i=i,,,N^^^, a given integration kernel with normalized shape h and time constant 



hw{t) ■— w ^h{T/w), and a given readout time tn: 



0'^Si{tii — u)hyj{u)du. 



(6) 



n>0 



The readout is noted /, as it must ultimately be an estimator of stimulus value /. We explicitely note the 
dependence on t]^ to emphasize that f(tfj) is built from a sliding temporal average of the spike trains ; 
so that each instant in time yields a potential readout. 

This is a classical form of readout from a neural population, which has often been used previously and 
described as the 'standard' model of perceptual integration ( Shadlen et al.[[l996| Haefner et al.[ 2013). 
The temporal parameters /i^, and tn describe how each neuron's temporal spike train Si{t) is integrated 
into a single number describing the neuron's activity over the trial: = J^yQ-^iitR — u)hw{u)du. In 

turn, the percept is built linearly from the population activity as f — a^Ji through a specific readout 
vector, or 'perceptual policy', a. 

However, traditional studies generally make ad hoc choices for the various constituants of this readout. 
Most often, simply describes the total spike count for neuron i, which in our model corresponds to 
choosing a square kernel h, and parameters w — t^ — T describing an integration over the full period 
T of sensory stimulation. As mentionned in the introduction, there is no reason that this should be 
a relevant description of sensory integration by the organism: the integration window w has a direct 
influence on predicted SNRs for the readout, and experiments suggest that animals do not always use 



the full stimulation period to build their judgement (Luna et al. 2005 Stanford et al. 2010). 



Instead, we make no assumption on the nature of w and tji, and view them as free parameters of the 
model. Then, the model parameters implicitly characterize the typical scales of perceptual integration by 
the animal. The number of significantly nonzero entries in a, say defines the number of neurons 
contributing to the percept. The readout window w characterizes the behavioral scale of temporal 
integration from the sensory neurons, and time t^ characterizes when during stimulation this integration 
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Figure 2. Linear readout and its interpretation, (a) We study a "standard" model of perceptual 
readout, with two parameters w and tfj defining integration in time, and a readout vector a defining 
integration across neurons, (b) Geometric interpretation of the model. The temporal parameters w and 
tfi define the tuning vector b and noise covariance matrix C in the population. Colored ellipses schematize 
the distribution of neural activities from trial to trial, for the three possible stimulus values. The readout 
/ can be viewed as an orthogonal projection of neural activities in the direction given by a. (c) Sensitivity 
Z, PCV curves ni{t) and noise covariance JPSTHs 7y (t, s) totally define the joint covariance structure 
between spike trains s{t) and percept /. (d) Any feedforward readout of neural activities can be viewed 
as a mapping / = F{s{t)), so the true F is implicitly constrained by the covariance data from panel c. In 
the case of the linear readout model, these constraints are summarized by three characateristic equations, 
which relate neural and perceptual data through the readout's parameters w, tn and a. 
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takes place. The exact shape h given to the integration kernel is of less importance ; for conceptual and 
implementational simplicity we assume it to be a square window. However, we note that (1) other shapes 
may have a higher biological relevance, such as the decreasing exponential mimicking synaptic integration 
by downstream neurons, and (2) nothing prevents our method from making h itself a free parameter, 
provided the data contain enough power to estimate it. Finally, our model can also be extended to 
versions where extraction time is not fixed, but varies from trial to trial ; this issue is discussed in 
section 14.3.21 

2.2.2 Characteristic equations for the readout 

Thanks to its linear structure, the readout defined in eq. [6] allows for a simple characterization of the 
covariance structure that it induces between neural activity s{t) and the resulting percept / (Fig.jijD). We 
show in appendix |X] that this covariance structure can be summarized by three characteristic equations: 

1 - 6^ a, (7) 
Z-^ = a^Ca, (8) 
7r(t) - r(t)a, (9) 

where vector b and matrices T{t) and C respectively describe the population's tuning and noise covariance 
structures, derived from the underlying neural statistics /3(t) and 7(t) introduced in eq. 3]|4 



bi{w,tR) := / (}i{tii-u)h^{u)du, (10) 



Ji>0 



rij{t\w,tji) -.^ ^ij{t,tji- u)h^{u)du, (11) 
Ju>a 

Cy(w,tfl):=/ T,j{tii - u)h^{u)du. (12) 

We here note the explicit dependency of b, T[t) and C on the temporal parameters of the readout w and 
t^. We will generally omit it in the sequel. Thus, the right-hand sides of eq. [7]|9] depend only on readout 
parameters w, tn, a and on the statistics of neural activity, independently of the animal's percept. 

On the other hand, the left-hand sides of eq. [7][9] describe experimental quantities related to the 
readout's resulting percept /. The first line describes the average tuning of / to stimulus /, that is 
9/E(/|/), which is equal to 1 because we assume that / is unbiased. The second line expresses the 
resulting sensitivity Z for the readout, defined as in eq. [l] It reveals the dual infiuence of the number 
of neurons (through a) and integration window w on the readout's overall sensitivity: indeed, under 



mild assumptions, the covariance matrix C scales with w ^ (see appendix A. 2.1 ). Finally, the third line 
expresses the resulting covariance between / and the activity of each neuron Si{t), defined as in eq. [i] 



This is essentially the relationship already revealed by Haefner et al. (2013), that choice probabilities are 
related to readout weights through the noise covariance matrix ; however, our formalism focuses on the 
simpler linear measure of PCV curves, and explicitly takes time into account. 

Both the neural measures f3{t) and 'j{t, s) on the right-hand side, and the percept-related measures 
Z and 7r(i) on the left-hand side, can be estimated from data. As a result, the characteristic equations 
define an implicit constraint on the readout parameters w, Ir and a (Fig. [2ji). Actually, if the readout 
model in eq. |6]is true, and precise measures are available for all neurons in the population, one sees easily 
that these constraints would uniquely determine the readout parameters. Indeed, for fixed parameters w 
and tfl, eq. [T] and [9] impose linear constraints on vector a. These constraints are generally overcomplete, 
since a is A'tot-dimensional, while each time t in eq. |9] provides A'tot additional linear constraints. Thus, 
in general, a solution a will only exist if one has targeted the true parameters w and Ir, and it will then 
be unique. 
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2.3 Estimating the scales of sensory integration 

In the previous section we have shown that, in the standard hnear model of percept formation, the 
trial-to-trial covariance structure between spike trains s(t) and the resulting percept / leads to a set of 
characteristic equations which implicitly define the parameters of the perceptual readout, provided the 
covariance structure has been fully estimated. 

Unfortunately, this direct approach makes a fundamental assumption which cannot be reconciled with 
real, experimental recordings: it assumes we have recorded all neurons from the population under survey, 
whereas real recordings only ever record from a small subset of that population. Thus we cannot hope 
to reconstruct the real vector a, simply because some — probably most — of the neurons contributing to a 
were not recorded. Moreover, even across those neurons which were recorded through a series of sessions 
in a given area, the noise covariance structure can never be fully assessed ; it remains elusive between 
neurons which were not recorded simultaneously. 

For this fundamental reason, the characteristic equations [7]|9] should be used with a different perspec- 
tive than the full recovery of readout parameters. Instead, we propose to exploit the structure of the 
equations in a statistical approach, with the restricted goal of estimating the typical scales of readout 
most compatible with recorded data. 



2.3.1 Reformulation in terms of neural subensembles 



A first necessary step in our approach is to statistically describe the nature of readout vector a. We are 
mostly interested in the support of a, meaning, the number and nature of neurons contributing to percept 
formation. Thus, we assume that the percept is built only from the activities of an unknown ensemble K. 
of neurons in the population and that, for given JC and temporal parameters {w,ti^), the readout vector 
a is chosen optimally to maximize the SNR of the resulting percept. Indeed, through this hypothesis, we 
totally reformulate the problem of characterizing a in that of characterizing /C ; which allows for much 
simpler statistical descriptions. 

The readout vector a achieving the maximum sensitivity Z in eq. [8] under the constraints of eq. [7] 
and having support on /C, is well known from the statistical literature. It is uniquely given by Fisher's 



linear discriminant formula (Hastie et al. 2009): 



1 



(13) 



where a/c, b/c and C^c are the versions of vectors a, b (eq. 10) and matrix C (eq. 12) restricted to neuron 
ensemble K.. By injecting the form (cq. 13 ) into eq. 8][9 we obtain a new version of the characteristic equa- 
tions, under the assumption that percept is built optimally from some given ensemble JC, and temporal 
parameters {w,tj^): 



■Ki{t \K.,W,tii) 



Z{K.) 



r 



(14) 
(15) 



Z in eq. 14 is the (optimal) sensitivity associated to this particular choice of /C, w and 1^. In cq. 15 
■Kiit) is the resulting, predicted PCV curve for every neuron i in the population (not only in ensemble 
JC). Tix,{t) is a row vector whose entries are equal to ^ij{t) (eq. 11 ) for neurons j G IC. 



These equations open the door to a statistical description of percept formation in the neural popula- 
tion: we can now parse through a large set of candidate ensembles K, and temporal parameters {w,t]i), 
and ask when the predictions for sensitivity (eq. [T4| ) and PCV curves (eq. 15 ) match their true psy- 
chophysical counterparts Z* (eq. [T]) and vr* (t) (eq. [5|). For sensitivity, the straightforward comparison is 
to require that Z{IC \w, tn) « Z*. 
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On the other hand, for the PCV equatfon (eq. [15|, it is pointless to search an elementwise match 
for every neuron i, between the predicted curve 7rj(t) and its true measure 7r*(t). Indeed, since only a 
small subset of the neurons have been recorded, no candidate readout ensemble /C will be equal to the 
true ensemble (say /C*) that was used by the animal ; and there is no guarantee that the covariance 
structure between i and /C, which gives rise to prediction (eq. |15[ ), should be similar to that between i 
and IC* . Instead, a given set of readout parameters (/C, w, tji) should be deemed plausible if they predict 
the correct distribution of PCV signals across the population, irrespective of exact neuron identities i. 
Full distributions are difficult to estimate from finite amounts of data, and we will find the following 
population averages to convey sufficient information: 



W{t \IC,w,tii) E,(bi{w,tR)TTi{t |/C, 
W*{t \w,tii) ■.= E,(b,iw,tR)7r*it)y 



(16) 
(17) 



where denotes averaging over the full population of neurons i = 1. . . A^tot- We will deem a set of 
readout parameters plausible if they yield W{IC \w,tfj) « M^*(w,ffl|^ Multiplying each PCV curve 
by the neuron's tuning bi (eq. 10) yields more stable estimates for W, as discussed in section 3.2 and 
appendix |B] 



2.3.2 Statistical constraints on readout scales 

There are many ways to compare the real values of sensitivity and PCV signals, to their predictions 
given by eq. ppS) We propose here an ad-hoc method, whose main characteristics are the following: (1) 
focus mostly on first-order statistics (i.e., means) across the neural population, (2) use arbitrary tolerance 
values to compare real and predicted data, (3) fit the two indicators sequentially: first SNR, then percept 
covariance. Due to its simplicity, this method will prove robust to measurements errors arising from finite 



amounts of data (section 3.3 ) 



Our method is also designed to cope with a fundamental limitation of real recordings: all neurons 
(ensemble /C, neurons i) contributing to predictions eq. T4p^ must have been recorded simultaneously, 
to assess their noise covariance structure. This constraint sets a limit on ensemble sizes K which can be 



easily investigated (but see section 4.4). Moreover, it prevents from estimating the full average of choice 
signals (eq. 16 ) predicted by a given ensemble K. — it is only available for simultaneously recorded neurons 
i. As a result, predictions (eq. 15) from different tested ensembles K, must somehow be aggregated to 
produce a reliable prediction of choice signals. 

We propose that each tested ensemble IC contribute to our estimates in proportion to its ability to 
account for the animal's sensitivity: 



Pz{IC \w,tj^) exp 



iZ{IC\w,tR)-Z*f 
2a| 



(18) 



normalized to insure ~ 1 across all tested ensembles {w and tn being fixed). Parameter az 

is the required tolerance for the fit, set by the experimenter. It is a regularization parameter creating a 
tradeoff between precision of fit (small az) and reliability of measurements, since a larger az leads to 
more samples K. with a substantial contribution PzilC). When testing our method (section [s]) we choose 
az as 5% of Z*. 

For each tested couple {w, tji), we then use Vz{JC \w, tji) as a weighting factor over all tested ensembles 

-"^Notc that W*(t \w,tji) depends on parameters {w,tji) only through the neurons' tunings bi{w,tji). In practice, as 
neural activities arc rather stationary in time, W*{t) changes very little for different values of parameters (w,tii). 
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/C, which yields two quantities: 

k{w, tR) := \w, tfl,)Card(/C), (19) 

K 

W{t \w, Ir) := PzilC \w, tfl)E,(yc) {hTT,{t |/C, w, Ir)) , (20) 

These 
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where Ei(ic) denotes an average across all neurons i available to compute a prediction with eq. 
neurons must have been recorded simultaneously to ensemble K. and, in order to produce an unbiased 
estimate of choice signals in the full population, they should not belong to K. itself. 



In eq. 19 K{w,tR) is the ense mbl e size K which most likely explains the animal's sensitivity, given 



readout parameters (w^tR). In eq. 20 W{t \w,tR) is the mean prediction for PCV signals biTTi{t) across 
neurons i in the population, but stemming only from ensembles /C which are compatible with the animal's 
sensitivity. Considering quantity W{t) introduced in eq. [T6j we see that 

W{t \w, tR) ~ ^ P2(/C \w, tR)W{t |/C, w, Ir). (21) 

K 

The equality is only approximate, because only neurons i recorded simultaneously to /C are available to 
estimate W{t). However, as neurons i are random and we average over many ensembles /C, W(t) rapidly 
converges to the quantity described in eq. |21[ 

Both W{t \w,tR) and W*{t \w,tR) are temporal signals defined over some interval [r,ni„ , Tmax] corre- 
sponding to one trial repetition. Defining the L2 norm for such temporal signals as 

= (r„,ax - T„,i„)-W x\t)dt, (22) 

we will deem parameters {w,tR) plausible if they lead to a small value of \\W{w,tR) — W*{w,tR)\\^ . 
To yield a quantitative estimate of fit, we introduce a tolerance aw and define the following weighting 
function; 



\W{w,tR)~W*{w,tR)\ 



2 



Pw{w, tR) - exp I ' n 2 ' 1 ' (23) 



normalized to insure X)™ -Pvk(w, tij) — 1 across all tested temporal parameters {w,tR). Again, toler- 
ance aw is set arbitrarily by the experimenter. When testing our method (section [s]) we choose aw as 
5%of\\W*{w,tR)\\. 



Overall, the statistical method introduced above reduces readout parameters to three numbers: the 
temporal extraction parameters w and tR, and the typical number of neurons K used by the readout. 
Thus, we can now apply a 'brute-force' approach: test all possible combinations (if, compute 
the population statistics from eq. [I8p3| and target the parameters that provide the best fit. In the 
next section, we show the validity of this statistical approach, which allows us to recover the typical 
scales {K, w, tR) of perceptual integration in an artificial network simulation. We further detail how 
this statistical approach can be adapted to counteract measurement errors which typically arise in real 
experiments from the finite number of available trials. 



3 Results 

3.1 Artificial neural network 

In this section, we show how the statistical analysis of sensitivity and choice signals described above 
allows to recover the scales of integration of the neural readout. Naturally, to assess the validity of our 
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method, it is necessary to know the true nature of this readout. This can only be achieved through an 
artificial simulation of sensory integration, where we have full control on neural activities and readout 
procedure. 

We thus implemented an artificial neural network, that encodes some input stimulus / in the spiking 
activity of its neurons (Fig. [3^). Precise parameters of this network are provided as Supplementary 
Material (section SI). Briefly, on each trial, 100 input Poisson neurons fire with rate /, taking one of three 
possible values 25, 30 and 35 Hz. The encoding population per se consists of 500 leaky integrate-and-fire 
(LIF) neurons. 100 of these neurons receive sparse excitatory projections from the input Poisson neurons, 
which naturally endows them with a positive tuning to stimulus /. 100 other neurons receive sparse 
inhibitory projections from the Poisson neurons, which naturally endows them with negative tuning. 
The remaining 300 neurons receive no direct projections from the input. Instead, all neurons in the 
encoding population are coupled through a sparse connectivity with random delays up to 5 ms. Synaptic 
weights are random and balanced, tuned to ensure overall firing rates around 30 Hz. We implemented 
and simulated the network using Brian, a spiking neural network simulator in Python (iGoodman and 



Brette 2008 1 . The statistics of activity for the resulting population are depicted in Fig. |3|3,c,e. 



We then define the true perceptual readout from this network. We pick a random set of K* = 40 
neurons in the population, whose activity is integrated over w* = 50 ms and read out at time ijj = 80 ms, 
on each stimulus presentation (each presentation lasting 500 ms). The resulting estimator /* of stimulus 
value is built optimally given these constraints, through Fisher linear discriminant analysis (eq.fTspj This 
leads to a 'psychometric' sensitivity Z* « 0.06 }iz~^, meaning that the network can typically discriminate 
variations of {Z*)~^/^ « 4.2 Hz in the input /. (For comparison, over the same integration period w*, 
the 100 input Poisson neurons can discriminate variations around 2.5 Hz.) We then compute a PCV 
for every recorded neuron, measuring its trial-to-trial covariance with estimator /* (Fig. |3|l). Applying 
the statistical method described above, our goal is now to recover the scales {K* ,w* ,t*j^ of perceptual 
integration, on the basis of the experimental measures depicted in (Fig. [3j;-e). 

3.2 Validation on full population data 

To show the theoretical validity of our analysis, we first apply it to a situation where the trial-to- 
trial covariance of neurons and percept /* (eq. l][5) has been fully measured, with high precisioi^ In 



particular, we assume full knowledge of the noise covariance structure ^(t, s) in our population (eq. |4|) 
Actually, in these irrealistic conditions, the statistical analysis described above is not useful: instead, one 
could directly solve the characteristic equations (eq. 7]|9| to recover the readout parameters a, w and 



t/j. However, this is a necessary first step to verify that our method is not flawed theoretically. Do the 
statistical quantities introduced in eq. T8p3 allow to recover the true scales of integration {K* ,w* ,1^^)! 



Assuming a square integration kernel h, we test a set of candidate temporal integration windows w 
from 10 to 100 msec, and a set of candidate readout times t/j from 10 to 200 msec, all in steps of 10 
msec. We then pick randomly candidate neural ensembles /C, of sizes ranging from 2 to 90 neurons, with 
50 different random ensembles for each tested size K. For each tested parameters {w,tj^), we compute 



the distribution of predicted SNRs Z{IC) given by eq. 14 across all candidate neural ensembles (Fig. W 



Each neural sample K. is then associated to a weight PzilC) describing the goodness of fit to the true 
SNR Z* (eq. 18 ). Following eq. igfio this yields an estimate for the best-fitting population size K{w, tp) 
(Fig.ji^, dashed vertical line) and mean PCV curve W{t jw, tfj) (Fig.|4j3). Since we assume full knowledge 
of experimental data, all 500 neurons i are involved in estimating W{t), independently of ensemble /C. 

In Fig. |4j;, we show the estimated population size K{w, tji) as a function of w and i/j. It shows the 
mark of the K-w tradeoff on sensitivity, mentioned in the introduction: smaller integration windows w 



^To avoid ovorfitting issues, the trials used to learn the optimal /* are not used in the subsequent analysis. 

^To compute these estimates, all 500 neurons were simultaneously monitored over 16,500 repetitions of each stimulus 
condition (not counting the trials used to train estimator /*). Using bootstrap resamplings over stimulus repetitions, we 
checked that the resulting measures were virtually error-free. 
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Figure 3. Artificial neural network used for testing the method, (a) Network architecture. The encoding 
layer consists of LIP neurons coupled through a sparse, balanced recurrent connectivity with random 
delays. Stimulus / is the firing intensity of a group of input Poisson neurons, which project sparsely 
into two subpopulations of the encoding layer (neurons excited by the input vs. neurons inhibited by the 
input). The majority of neurons in the encoding layer receive no direct projection, but can still acquire 
stimulus tuning through the recurrent connections. A "true" readout /* is produced on every trial on 
the basis of true parameters w*, and K* — which should be retrieved by our method, (b) Classic 
population statistics in the encoding layer, for neural spike counts over a trial, (c) Sample PSTHs from 
the encoding layer. Model neurons display varied firing rates, and tunings of different polarities, (d) 
Sample PCV curves for the same neurons as panel c, computed by correlating each neuron's spikes with 
the true readout (e) Sample JPSTHs (noise correlations) for pairs of neurons in the encoding layer. 
Inset: corresponding cross-correlograms, obtained by projection along the diagonal. 
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Figure 4. Statistical recovery of readout parameters: noiseless measures, (a) For each tested temporal 
parameters {w,tii), predicted sensitivities Z(IC) are computed for several candidate readout ensembles 
K, of varying sizes. The goodness of fit to true sensitivity Z* defines a weighting function Pz{JC) across 
ensembles, (b) The weighting function is used to compute a compound prediction W(t) for the average 
PCV signal in the population, which is compared to the true average W*{t). The three columns in 
panels a-b correspond to different candidates {w,tji) for temporal integration, (c) Best-fitting ensemble 
size K depending on candidate parameters {w,ti^). The K — w tradeoff on sensitivity is clearly visible, 
(d) Goodness of fit of PCV signals depending on candidate parameters {w,tj^) shows a clear optimum 
around the true parameters of the readout, (e) Same as panel d, but transformed into a weighting function 
Pwiwjtfi) over candidate temporal parameters. 
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require larger ensenible sizes K to account for the animal's sensitivity. In Fig. |4[i, we show two measures 
of the resulting fit between W{t and its true value W*{t \w^tj^). In the first panel, we plot the 

plain L2 norm between the two temporal signals (using Tmin = —100 msec and Tmax = 200 msec as 
integration bounds). In the second panel, we reexpress this L2 norm as a weighting Vwiw^tfi) over the 



set of tested temporal parameters (eq. 23 1. Applying this final weighting over candidate values w, t^j. 



and K(w,tR) yields numerical estimates for the scales of the readout: 

{y = 54 ± 9 msec 
t/j = 81 ± 6 msec 
34.7 ±8.7 

These estimates are very close to the true values w*, and K* , showing the theoretical validity of this 
approach. The estimated K is somewhat smaller than its true value K* ~ 40, however this is no bias in 
our method: it simply means that the 40 neurons chosen randomly as the source of percept were slightly 
less sensitive than the 'average' 40 neurons in the population. 

We also remind that these estimates depend on the tolerance levels fixed by the experimenter to 



compute P2(/C) (eq. 18) and Vw{w,tfi) (eq. 23). Numerically, we find the resulting mean estimates to 
be rather stable across a range of sensible tolerances. On the other hand, the resulting error bars — which 
are obtained as second-order moment of the quantities weighted by Vw^w^tp) — only describe the typical 
variations of the parameters that lead to estimates within the fixed tolerances. In particular, driving the 
tolerances to zero always drives the error bars to zero, even though the predicted averages may become 
false as too little data enter their computation. 

Why does the method work? Essentially, it proceeds in two successive steps. First, {w^tu) being held 
fixed, it uses SNR information to target plausible neural ensembles K, (Fig. [4^,c). Since the readout is 
assumed to be optimal, the mean SNR can only increase with the size K of the ensembles considered 
(Fig. |4^, plain blue curve) . Plausible ensembles K are those lying near the crossing of this curve with the 
true 'psychometric' SNR (Fig. |4^, dashed red curve). For a straightforward application of our method, 
this crossing should occur within the typical ensemble sizes K tested — which are, in practice, limited 
by the number of simultaneously recorded neurons. In section [4. 4[ we discuss possible extensions of the 
method to the case where the crossing does not occur. 

Second, {w,tfj) being still fixed, an average PCV prediction W{t \w,tif) is built, using the neural 
ensembles K, targeted above, and compared to the true mean PCV curve W*{t \w,tji). It is not trivial 
that this comparison should work. To simplify the argumentation, let us assume that parameters w and 
tfl are fixed at their true values w* and tjj. On the one hand, since the true percept is built from some 



(unknown) neural ensemble JC*, we have W*{t) = W{t |/C*), using the notations of eq. 16pT On the 



other han d, th e prediction W{t) is built as a compound mean of W{t |/C) over several candidate ensembles 



/C (see eq. 21 1. As our method requires a match between W*{t) and W{t)^ it implicitly supposes that all 
ensembles K. contributing to W{t) lead to very similar population averages W{t \K). 

Predicted curves W{t |/C) are generally not available experimentally. However, we can compute 
them in our full-data simulation (Fig. [5|. We find that, amongst ensembles /C of similar size the i- 
population means of bi'Ki{t |/C) rapidly converge to a single curve, independently of ensemble K, (Fig. [5]d). 
Furthermore, this result is not trivial: when the same analysis is performed on the plain PCV curves, 
not multiplied by tuning 5,;, the convergence does not occur anymore (Fig. [5^), or at least not as fast. 
In Fig. [sj:, we plot the ratio between the variance of curves accross different ensembles /C, and the power 
of the mean curve, across all ensembles K, of same size K. This ratio quickly drops to zero for the 
tuning- multiplied PCV curves (blue), but not for the plain PCV curves (green). 

To summarize, it is crucial for our method that each PCV curve 'Ki{t) be multiplied by the neuron's 
tuning bi before computing population averages. Aside from the experimental observations of Fig. [5j 
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Figure 5. Mean percept covariance curves depend on readout ensemble /C. (a) The mean value of 
PCV curves ni{t) across neurons i in the population depends strongly on the readout ensemble /C giving 
rise to the percept, (b) The mean value of tuning- multiplied PCV curves bi'Ki{t) depends much less on 
the exact ensemble /C, only on its size. This justifies our definition for the mean PCV curve VF(t|/C) 



(eq. 16 1, (c) Relative variance of mean PCV curves, across readout ensembles /C of the same size. It 



is defined as the average of ||M^(/Ci) — M^(/C2)IP across all ensembles (/Ci,/C2) of similar size, divided by 
||Ei<-M^(/C)||^, power of the average curve across ensembles of size K. For the tuning-multiplied version 
of W{t\JC) (blue), this ratio quicky drops to zero. This is not the case for the plain mean Ei(7ri(i)|/C) 
(green) . 
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several arguments justify this operation. First, it is well-known experimentally that choice signals and 



tuning for individual neurons are often positively correlated at the population level (Britten et al. 1996 



Uka and DeAngelis 2003p . Intuitively, this is because positively-tuned neurons contribute positively to 



stimulus estimation, and conversely for negatively-tuned neurons. The strong population-wide correlation 
is indeed present in our simulated network (Fig. [sjD). As a result, the population average for bi'Ki{t) is 
expected to be mostly positive (Fig. [5]d) , which diminishes possible variations from one ensemble /C to 
the other. Second, theoretical arguments (appendix [B] and Supplementary Material S2) show that hi'Ki[t) 
is a form better suited to compute an i-population average. It can be shown to be positive under mild 
assumptions, and its laws of convergence can be related to the overall spectrum of covariance in the 
population. 

3.3 Validation on finite data 

Having shown the theoretical efhciency of the statistical quantities introduced above in retrieving the 
correct scales of perceptual integration, we now test our method on its real purpose: recovering the scales 
from incomplete experimental data (Fig.|6]). We thus limit our measures to 150 repetitions for each tested 
stimulus. Furthermore, we split our population in 5 ensembles of 100 'simultaneously recorded' neurons, 
so that noise covariance information (eq. |4| is only available between neurons belonging to the same 
ensemble. We use the same candidate values for parameters w, Ir and K as before, picking 50 candidate 
ensembles JC for each tested size K. Neurons in JC always belong to the same 'simultaneous ensemble', 
which is picked randomly. Finally, for each ensemble /C, we consider 10 additional neurons i, from the 
same 'simultaneous ensemble' but segregated from neurons in /C, to compute the PCV prediction W{t) 



(eq. 201 



The method then proceeds as above, save a couple of modifications due to the incompleteness of 



the data. First, concerning SNR computations (eq. 14 1, the estimated covariance matrix Cfc may turn 



out to be rank-deficient up to numerical precision (although it should be full-rank theoretically, since 
the number of trials (450) is larger than the largest tested size K). We thus replace its inverse 
by its Moore-Penrose pseudo-inverse, with the default numerical tolerance of our mathematical software 
(Matlab). Even so, we observe a global overestimation of predicted sensitivities compared to their 
values in the full-data case (dashed blue lines in Fig. |6]i, reproduced from Fig . [4^) . This overfitt ing is 
a well-known feature when estimating Fisher sensitivity from insufficient data ( [Raudys and Duin 1998 



Hoyle 2011) 



Second, concerning mean PCV predictions (eq. 20), our final estimates W{i) become noisy, reflecting 



the jagginess of the underlying neural measures due to insufficient trials (Fig. |6p). This jagginess is 
problematic, as it artificially increases measured values for the divergence ||W^(w, tR) — W*{w, tij)|P, which 
is our final criterion to retrieve plausible values of (w, tR). However, this eflFect can be largely compensated 
by resorting to resampling over trials (bootstrap). More precisely, for each tested parameters (w,tji), we 
may describe our noisy measures in the form: 

W^"""''\t) = W^('''="')(t) +77(0, 
VF*'('"^'^")(t) = W*'^'''"'^\t) +ri*{t), 

where r]{t) and r]*{t) describe our (unknown) measurement errors on W and W*. From this follows the 
estimate: 

g|j^(meas) _ -^*,(meas) 1 1 2 ^ ^^^/ireal) _ ^*,(reai)||2 _^ E||?7||^ + E||77*||^, (24) 

where E denotes the (theoretical) expectancy over the set of trials giving rise to measures W and W*. 
This estimate is based on the assumption that measurement errors 'ri{t) and 77* (i) are independent, which 
is likely to be the case given that W{t) stems from predictions (on the basis of neural tunings and noise 
covariances) whereas W*{t) stems from measurements of the true PCV curves. All terms involving an 
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expectancy E in eq. [24]can be estimated by resampling with replacement over the set of recorded trials. 
By computing their difference, we thus get a corrected estimate for - ||2_ rpj^jg 

estimate plotted in Fig. [6]i. A drawback of this method is that the resulting estimate may become 
(slightly) negative when the underlying match between W and W* is "too good" . However, this does not 



prevent from estimating the resulting weighting function Pw{w,tji) (eq. 23 1, accepting that some terms 
in the exponential may become (slightly) positive. 

The final results, in Fig. [6] show that our method is still able to recover the most plausible scales of 
perceptual integration. Using the same tolerances as previsouly (5% of the power of the true measures), 
we find the following final estimates: 

id — 50 ±8 msec, 
tfl, = 81 ± 6 msec, 
^ = 28.3 ±5.2, 

again very close to the true scales of the readout. Notably, K is smaller than its prediction in the 
full-data analysis: this is a consequence of the slight overfitting on estimated SNRs, which leads to an 
underestimation of the population size K required to match the psychometric SNR. The issue remains 
minor in this setup ; however we note that standard cross-validation and regularization techniques exist. 



that respectively assess and counteract the effects of overfitting (Hastie et al. 2009). 

In conclusion, the statistical method introduced above allows to overcome the missing data inherent 
to realistic recordings, by integrating information from all recorded neurons into a few reliable statistical 
estimators. 



4 Discussion 



4.1 Link with previous literature 

We have proposed a framework to interpret sensitivity and choice signals in a standard model of perceptual 
decision-making. The purpose of our study is to help understand how perceptual integration takes 
place from a full sensory neural population. This question requires, not only to compute neurometric 
sensitivities or choice signals for individual neurons, but also to integrate these measures in a single big 
picture of how information is read out from the population as a whole. 

The sensitivity to stimulus achievable by a neural population has received much attention, both ex- 
perimental and theoretical. It was progressively realized that (1) the structure of noise correlations 
influences the amount of information that can be extracted from a neural population, and (2) the lin- 
ear readout maximizing sensitivity is generally not a simple average of neural activities, but rather an 
adequate weighting optimizing the ratio between signal and noise extracted from the population, which 
corresponds to Fisher's linear discriminant (see Abbott and Dayan 1999 Averbeck et al. 2006 and ref- 
erences therein). Similarly, the role of time window w used to integrate the spike counts of each neuron 
has long been acknowledged to have a direct effect on the overall estimated sensitivity (see, e.g 
et al.[ |1992] |Uka and DeAngelis[ [2003[ [Cohen and Newsome} |2009[ [Price and B6riil|2010 ). 



Britten 



Choice signals have also received much attention since their first measurements, in the form of choice 
probabilities (Britten et al. 19961. The temporal evolution of choice signals is routinely computed to 



qualitatively establish the instants in time when a given population covaries with the animal's percept 



(de Lafuente and Romo 2006 Price and Born 2010). Recently, the specific temporal evolution of CP 



signals during a depth discrimination task has cast doubt on the traditional, feedforward interpretation of 

However, very little studies have quantitatively 



CP signals (Nienborg and Cumming 2009 



see section 



4.3) 



interpreted CP signals so far, because no analytical relationship was available to interpret their values. 
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Figure 6. Statistical recovery of readout parameters: noisy measures. Same legends as Fig. [ij but with 
modifications specific to small sample data. In panel b, the thin curves are different versions obtained 
through bootstrap resampling over trials, and the thick curve is the average across bootstrap samples. In 



panel d, the L2 norm is corrected for measurement errors, using the bootstrap samples and eq. 24 With 
this modification, our method can recover the true readout parameters on the basis of finite amounts of 
data. 
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Only recently have Haefner et al. (2013) derived the analytical expression of CPs in the standard model 



of perceptual integration (see section 4.2) 



To the best of our knowledge, only one study has explicitly proposed to jointly use sensitivity and 
choice signals, as two independent constraints characterizing the underlying neural code. In this seminal 



study, Shadlen et al. ( 1996 ) proposed a feed-forward model of perceptual integration in visual area MT 



responding to a moving dots stimulus, and studied how the population's sensitivity and individual neuron 
CPs vary as a function of model parameters such as the number of neurons, strength of noise correlations, 



etc. In section 2.2 we have formalized this intuition of Shadlen et al. (1996), by showing that sensitivity 



and choice signals are two distinct, constitutive elements of the joint covariance structure between percept 
/ and neural activity s{t) (Fig. M;). The third constitutive element is the noise covariance structure of 
s{t) itself, a result also intuited by [Shadlen et al" (1996) even though they assumed an oversimplified, 
homogeneous noise correlation matrix. 

Unlike most previous theoretical studies on the subject, we explicitly modeled all neural activities in 
time. Indeed, this is the only way of targeting the instants of sensory stimulation which contribute to 
percept formation, and thus to decipher to K~w tradeoff on sensitivity. Finally, the statistical approach 



developped in section [Z3| is, to our knowledge, the first attempt to build inhomogeneous, partial measures 
of neural activity into a quantitative interpretation of percept formation from the full neural population. 



4.2 Choice signals in realistic experiments 

Our model, as presented above, assumes a direct perceptual report of stimulus value /* on every trial. 
Real experiments generally involve a more indirect report: to allow easier task learning by the animal, 



the report is always binary. In the classic random dot motion discrimination task (Britten et al. 1992), a 



monkey is visually presented with a set of randomly moving dots whose overall motion is slightly biased 
towards the left (/ < in our notations) or towards the right (/ > 0). The monkey must then press 
either of two buttons depending on its judgement of the overall movement direction. In another classic 



task (Mountcastle et al. 1990), monkeys must discriminate the frequencies /i and /2 of two successive 



vibrating stimuli on their fingertip. They must press one button if they consider that /i > /2, and the 
other button otherwise. 

Thus, classic choice signals such as CP only measure the covariation between the spike train of each 
neuron and the animal's binary choice c on each trial. To infer anything about the animal's underlying 
percept f* , it is also necessary to assume a behavioral model describing how the monkey takes a binary 
decision, on every trial, on the basis of its sensory percept. Most often, this behavioral model is implicitly 
assumed to be optimal. For example, in the random dot motion task, it is generally assumed that c — 
H{f*) (Heavyside function), which is clearly the optimal policy if the animal has no further information 
about /. In the two-frequency task, the optimal behavioral model would be c = H{fi — /2 ). However, in 
the real experiment, the monkeys have to memorize /i for a few seconds before /2 is presented, so potential 
effects of memory loss may also come into play. More generally, behaving animals can display biases, 
lapses of attention, various exploratory and reward-maximization policies that lead to deviations from 
the optimal behavioral model. To summarize, choosing a relevant behavioral model is a connex problem 
that cannot be addressed here, and that will vary depending on the task and individual considered. 

However, for most tractable behavioral models, the predicted sensitivities and choice signals will 
ultimately rely on the quantities introduced in this article. To take the simplest example, we focus on the 
random dot motion task with optimal policy c = H{f*) — as assumed in most models of the task — and 
make the classic assumption that the statistics of /* (given /) are Gaussian (Fig. [7^). This model predicts 
the following psychometric curve (probability of button presses as a function of stimulus value) : 



E(c|/) = p(r >oi/) = $(V^/), 



where $ is the standard cumulative normal distribution, and Z* is the square SNR for /*, as defined in 
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eq.[l] Thus, Z* used in our model can easily be retrieved from experimental measures of the psychometric 
curve. 

Same results hold for choice signals. Generally, choice signals are directly computed over some tempo- 
ral average 'si of the underlying spike trains. Choice probability for every neuron i measures the area under 
the ROC curve between the two distributions of , respectively conditioned on c = and c = 1 ( Green 



CP, 



and Swets 1966). Recently Haefner et al. (2013) have shown that, assuming (1) multivariate Gaussian 
statistics between s and /*, and (2) the optimal behavioral model c = H{f*), choice probability can be 
analytically expressed as: 

1 ^ V2 < 

2 TT a{si) 

a formula virtually exact over the full range of plausible CP values. The rightmost fraction is nothing 
but the Pearson correlation between variables Si and /*. The numerator involves the linear covariance 
between and /* which is, in our notations, the temporally averaged PCV curve tt*. The authors further 
derived that, in the standard model of percept formation with readout vector a, this term is given by 
TT* = Ca, which is exactly the PCV characteristic equation (eq. [9| in its temporally-averaged form. The 
CP formula involves a normalization by cr(s7), the standard deviation of spike count Si". This prevents 
from a straightforward extension of the formula in time, because cr(si) tends to infinity as the integration 
window used to compute s7 tends to zero. 



A simpler measure of choice signals is the choice-conditioned difference in firing rate ( Britten et al.[ 
1996), which can be computed for every individual neuron i as Ai{t) := E{si{t) \c — 1) — E(si(t) |c = 0). 
Under the same assumptions as above (Gaussian statistics for s(t) and /*, optimal behavioral model), 
this difference can be analytically expressecj^as: 



(25) 



This is very close to the CP formula, but without the additional normalization by cr(si). Thus it directly 
allows for a simple generalization to temporal signals. Since Ai(t) is easily computable from experimental 
data, it provides the easiest way of accessing the underyling PCV curves ■7T*{t) used in our article. 



4.3 Model hypothesis 
4.3.1 Linear integration 

The readout model (eq.[6]) used to analyze sensitivity and choice signals is an instalment of the 'standard', 
feedforward model of percept formation. As such it makes a number of hypothesis which should be 
understood when applying our methods to real experimental data. First, it assumes that the percept /* 
is built linearly from the activities of the neurons. There is no guarantee that this is the case during 
real percept formation, but linearity is an unavoidable ingredient to make quantitative predictions at the 
population level. Even if the real percept formation departs from linearity, fitting a linear model will 
most likely retain meaningful estimates for the coarse information (temporal scales, number of neurons 
involved) that we seek to estimate in this work. 

More precisely, the model in eq. [6] assumes that spikes are integrated using a kernel separable across 
neurons and time, that is A'^{t) — a^hw{t). Theory does not prevent from studying a more general inte- 
gration, where each neuron i contributes with a different time course A^(t). The readout's characteristic 
equations are derived equally well in that case. Rather, assuming a separable form reflects (1) the in- 
tuition that the temporal components of integration are rather uniform across the population, and (2) 

^Relying on the general formula E(Xi|X2 > 0) = V2tt~^p, applicable to any bivariate normal variables (Xi,X2) with 
means 0, unitary variances, and correlation coefficient p. We note that the assumption of normality is violated at small time 
scales because Si{t) is clearly not Gaussian in that case. However, in practice, Ai{t) is always computed with a minimal 
amount of temporal smoothing which resolves this potential issue. 
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of a variable readout time 
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Figure 7. Discussion, (a) Classic behavioral model. If the task is to judge whether / > 0, the optimal 
behavioral policy consists of the simple threshold rule c = H(f*) (Heavyside function). Furthermore, the 
trial-to-trial distribution of percept given / (distributions with different colors) is generally assumed 
to be Gaussian. Under these hypotheses, sensitivity and PCV signals used in this article are directly 
computed from real experimental data (neurometric curve and choice signals), (b) If readout time tn 
varies strongly from trial to trial (with density g{t)), it leads to a flattening of PCV signals (thick green 
curve) compared to the case with deterministic tji (dashed green curve), (c) If a decision-related signal 
feedbacks into sensory areas, it leads to a divergence of PCV signals (thick green curve) after the readout 
time tj^, compared to the case without feedback (dashed green curve). 

the impossibility to fit a model with general kernel A'^{t). Instead, we summarize temporal integration 
from the population by two parameters w and t^, opening the door to a reliable estimation from data. 
Although the integration shape h could also be fit from data in theory, it seems more fruitful to assume 
a simple shape from the start (a classic square window kernel in our applications). Given that our goal 
is to estimate the coarse scales of percept formation, our method will likely be robust to various simple 
choices for h. As a simple example, we tested our method, assuming a square window kernel, on data 
produced by a readout using an exponential kernel, and still recovered the correct parameters w, tji and 
K. 



4.3.2 Non-deterministic extraction time 

Our model, as presented above, makes another important assumption: that perceptual readout occurs at 
the same time Ir on every stimulus presentation. This assumption is likely to be valid in perceptual tasks 
that allow a fast reaction from the animal ('reaction time' tasks), in which case tfi will generally be as 
small as it can get (see, e.g., Stanford et al.[ 2010). However, when sensory stimulation lasts longer (say, 
over 500 msec) it opens the door to variations in tfj from trial to trial, or even to several reactualizations 
of percept /* during the same trial. For example, imagine that the stimulus is a particular RGB color on 
a monitor, and you are asked to judge whether it contains more green (G) or blue (B). From intuition, 
we can tell that our performance in such a task will not sensibly increase whether we watch the color 
for one second or one minute. In our model's formalism (eq. [6]), this reveals built-in limitations on the 
effective integration window w that we can use in the task (remember that the readout's performance is 
proportional to w). But then, if our percept arises from a limited integration window w and we indeed 
watch the color for a full minute, when is our percept built? 

In appendi x [A] we derive a more general version of the characteristic equations (eq. 7]|9) assuming 
that tji in eq. |6j is itself a random variable, drawn on each trial following some probability distribution 
g(t). Because sensory neurons have rather stationary activities in time, this additional assumption 
does not strongly affect the readout's sensitivity. On the other hand, it affects strongly PCV curves. 
Essentially, the resulting PCV curve resembles a convolution of the deterministic PCV curve by g{t) 
(Fig. [7|d, section A. 2.2). If g{t) is substantially distributed in time, the PCV curves will become broader, 
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and flatter. In practice, this means that if a behavioral task is built such that can display strong 
variations from trial to trial, the statistical method introduced above will produce biased estimates. In 
theory, this issue could be resolved by adding an additional parameter in the analysis to describe g{t) 



(see section A. 2. 2 1, but the validation remains to be done. 



4.3.3 Top-down influence of choice 

Finally, our 'standard' model assumes that percept formation is exclusively feed- forward. The activities 
Si{t) of the sensory neurons are integrated to give rise to percept / and the animal's resulting choice 
c, and this forming decision does not affect sensory neurons in return. Recent evidence suggests that 
reality is more complex. By looking at the temporal evolution of CP signals in V2 neurons during a 



depth discrimination task, Nienborg and Gumming ( 2009 ) evidenced dynamics which are best explained 



by a top-down signal, biasing the activity of the neurons on each trial after the choice is formed. In 
our notations, the population spikes Si{t) would thus display a choice-dependent signal which kicks in on 
every trial after time tn, resulting in PCV signals that deviate from their prediction in the absence of 
feedback (Fig.[7}:). 

What descriptive power does our model retain, if such top-down effects are strong? The answer 
depends on the nature of the putative feedback. If the feedback depends linearly on percept / (and thus, 
on the spike trains), its effects are fully encompassed in our model. Indeed, this feedback signal will 
then be totally captured by the neurons' linear covariance structure "fij{t, s), so that our predictions will 



naturally take it into account. This is also the case if the oddity noted by Nienborg and Gumming (2009) 
is due to global shifts of neural excitability from trial to trial. On the other hand, if the feedback depends 
directly on the choice c — which displays a nonlinear, 'all-or-none' dependency on / — then it will not be 
captured by our model, and lead to possible biases. Even so, the effects of the feedback could be largely 
alleviated through a small trick: compare true and predicted PGV signals only up to (candidate) time 



tn (see eq. 22 ) 



4.4 Extrapolation to larger neural ensembles 

Gan we understand in more depth the statistical principles at work underneath our method of estimation? 
What factors govern the evolution of sensitivity Z(IC) (Fig. |4^, eq. 14) and mean PCV signal VF(i|/C) 



(Fig. |4j3, eq. 16), as a function of the number of neurons K used for readout? This question is not only 
of theoretical, but also of practical interest. Indeed, it may happen in real applications that the number 
of simultaneously recorded neurons -ft'max is too small to observe the crossing of predicted and true SNR 
curves (Fig. In such a case, predictions will be biased because no recorded ensemble IC can readily 

account for animal sensitivity. 

What predictive power do ensembles up to size i^max contain about larger ensembles? For example, 
can we extrapolate the shape of the mean SNR curve (Fig. |4^) to K > K^^^l In appendix [b] we address 
this question theoretically, by studying the value of SNR and PGV signals as a function of ensemble 
size K, and of the general structure of activity in the population. Our study relies on the singular 
value decomposition (SVD) of neural activity in the population. The SVD reveals a set of m = 1 . . . M 
independent modes of population activity, each mode being associated to a power and a sensitivity 
r/^. Essentially, the sensitivity embedded in a neural ensemble JC of size K increases as the sum of 
sensitivities for the K first modes in the population — which are the modes with the largest powers A^. 
Conversely, the overall power of PGV signal W{t) decreases as the average value of A^„ in these K first 
modes, weighted by their respective sensitivities. 

Because there is no general relationship between the power A„i of a mode and its sensitivity to stimulus 
r],n, there is no trivial way of extrapolating SNR and PGV predictions to ensemble sizes K that were 



^Actually, this is always bound to happen for smaU tested parameter w, following the K-w tradeoff. 
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not monitored simultaneously. Any such extrapolation can only be done through specific assumptions 
about the link between Am and 1].^ — which essentially amounts to characterizing the relative embedding 
of signal and noise in the full population (Wohrer et al. 2012). For example, it is classically assumed 



that the noise covariance matrix is "smooth" with respect to the signal covariance matrix, so that the 
former can be predicted on the basis of the latter (Wohrer et al. 2010 Haefner et al. 2013). Thus, while 



extrapolation of the statistical method above to larger populations is not trivial, it can be performed 
under specific assumptions about the embedding of signal and noise in the population considered. 



5 Conclusion 

We have shown how classic data recorded during perceptual decision-making experiments can be inter- 
preted as samples from the joint covariance structure of neural activities and animal decision. Assuming 
a standard linear model of percept formation from neural activities, we derived a set of characteristic 
equations which relate neural and perceptual data, and thus define implicitly the parameters of perceptual 
integration by the animal on the basis of its sensory neurons. The neural data consist of neural PSTHs 
(first moment of neural activities) and JPSTHs (second moment of neural activities). The perceptual data 
consist of the animal's sensitivity, and of each neuron's covariance with the animal's choice — a quantity 
often assessed through choice probabilities, and for which we proposed a simpler linear equivalent coined 
percept covariance (PCV). 

We then proposed a method to utilize these characteristic equations in a case of practical interest, 
when experimenters only have access to finite statistical samples of neural data across the full population. 
Our goal was to successfully recover the instants in time and the typical number of neurons being used for 
percept formation — a difficult problem which cannot be solved on the sole basis of sensitivity information, 
due to the "K-w tradeoff". Our method relies on statistical averages of predicted sensitivity and PCV 
signals arising from random, candidate neural ensembles used as the source of percept formation ; and 
seeks to match these predictions with the true, recorded perceptual data. We tested this method on an 
artificial neural network producing a form of stimulus encoding, and showed that it successfully recovers 
the scales of perceptual integration, on the basis of sample recordings of realistic size. 

Our method opens the way to novel experimental assessments of percept formation in sensory decision- 



making tasks. Indeed, the two main quantities used in our statistical analysis — sensitivity Z (eq. 14 ) and 



mean PCV curve W{t) (eq. 16) — rely on classic experimental measures. The main limitation of our 
approach is the size of candidate readout ensembles which can be considered, as they should necessarily 
have been recorded simultaneously However, the number of simultaneously recorded neurons is constantly 
pushing upwards with modern experimental techniques, so we may expect that this limitation, if it 
exists, will soon be overcome. Furthermore, through a theoretical analysis based on the singular value 
decomposition (SVD) of neural activities, we showed the possibility of extrapolation to larger ensemble 
sizes than those simultaneously recorded, although such extrapolations can only be done under specific 
assumptions, and on a case-by-case basis. For all these reasons, our method can readily be tested on 
real data, and hopefully provide new insights into the nature of percept formation from populations of 
sensory neurons. 
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Appendices 



A Characteristic equations for the readout 
A.l Derivation 

We here derive the characteristic equations for the hnear readout introduced in the main text, and further 
comment some of its properties. We consider a more general version of eq. |6j where the extraction time 
tf( is aUowed to vary from trial to trial. We thus assume that tn is itself a random variable, drawn on each 
trial according to some density function g{t), independently of neural activities s{t). The full readout 
model then writes: 

X!/ a's,{tR- u)h^{u)du, (A.l) 
, Ju>0 

tR^g{t). (A.2) 

This model naturally encompasses the simpler version presented in the main text, with a deterministic 
time tu: it corresponds to taking g(t) as a Dirac function located on that deterministic value. 

The characteristic equations fo r thi s model rely on a straightforward computation of the second order 
statistics of /, starting from eq. A.l To deal with random time t^^ we note that for any random 
process X{t) independent of i^, E(X(<j^)) = ^^^^g{t)F.{X{t))dt. This expression is valid only if is 
independent from the random variables contributing to X (in our case, the spike trains). 

Then, the expected value of / given a stimulus / writes: 

E(7(ifl)|/)= l9{t)Y.I a'Hs^[t~u)\f)K{u)dudt 

Jt j Ju>0 

= y2a' f{9*h^){t)\{t;f)dt, (A.3) 

where g -k h^{t) — g{u)h^{u ~ t)du is the temporal correlation between g and /im, and Xi(t; f) is the 
PSTH for neuron i in stimulus condition /, defined as in the main text (eq. |2|. 
Similarly, the expected value of given a stimulus / writes: 



u)hw{u)du 



f]dt 



Jt j^- J J{u,v)>0 



)\f)hw{u)hyj{v) du dv dt 



VaV // G^{t,s)T^,j{t,s:J)dtds, (A.4) 



where we have defined Gyjlt,s) :— j^g{u)hw{u — t)/i^(u — s)du, and rjij{t,s]f) := Yi{si(t)sj{s)\f). 
r]ij{t,s]f) is very related to the covariance structure in the population. It corresponds to the "plain" 
JPSTH for the neurons in stimulus condition /, before correcting by the so-called "product predictor" 



(Aertsen et al. 1989) 
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Finally, the expected value for the product of / and the activity of any neuron Si{t) writes: 
E(/(tfl)s,(i)|/) = f 9{s)J2 f a^E{s^{t)s,is-u)\f)hUu)duds 

Js J Ju>Q 

using the same notations as above. 



(A.5) 



The three expressions eq. A.3|A.5| roughly correspond to the three characteristic equations for the 
readout. To obtain them, we consider the variational versions of the previous expressions. First, we 
obtain the characteristic equation for tuning by differentiating eq. |A.3| with respect to stimulus. Second, 
equations A. 4 and A.5 are expressed in 'product' form E(Ary), whereas the corresponding characteristic 
equations are expressed in 'covariance' form Gav{X,Y) = 'Ej{XY) — E(X)E(y). Once this is done, and 
after some rearrangement of the terms, we obtain the characterisitic equations for tuning (eq. A.6), 
sensitivity (eq. A. 7) and percept covariance (eq. A. 8): 



a;E(/|/) = Va^ [{g^h^){t)mdt, 
(Var(7|/))/ = 0^0^ ( / / G„(t, sH,{t, s) dtds + Vlf^^) , 
{Cov{f,Si{t)\f))f = J2'^^J^i9*hyj){s)j,j{t,s)ds. 



(A.6) 
(A.7) 
(A.8) 



In eq. A.6 (3i{t) is the temporal tuning curve for neuron z, defined as in eq. |3] If the readout is unbiased, 
the left-hand side is equal to 1, as in the main text. In eq. |A.7 and A.8 7ij(i, s) is the covariance structure 
(JPSTH) betwee n neu rons i and j, defined as in eq. |4] 



Finally in eq. A.7 matrix V*^^"^^ is an additional source of variance that appears only when g{t) has 



an extended temporal support, i.e., when tji is non-deterministic. It then writes: 



temp 



,(t,s)((A,(i;/)-A.(/))(A,(s;/)-A,(/) 



dt ds. 



A.3). Thus, V//'"'' 



where Xi{f) := j^{g * hw){t)\{t] f)dt is the temporal average already used above (eq 
measures a form of temporal covariance in the PSTHs for the neurons. 

When tu is deterministic, as in the main text, we have g{t) = 5{t — tj^), a Dirac function. Then, the 
temporal integration kernels used in eq. A.6|A.8 boil down to (g ★ hiu){t) = /iu,(tj?, — t) and Gw{t,s) = 
hwitR — t)hw{tji — s). One checks easily that in these conditions, the additional temporal variance term 
y^^'^p vanishes, and we recover the characteristic equations from the main text. 

A. 2 Additional interpretations 
A. 2.1 Sensitivity as a function of w 



In the form of eq. 



A.7 



it is not clear how the value of w influences the variance of /, and thus the 



readout's sensitivity. To get a better intuition, let us first neglect the temporal variance term V^^"^^ . 
One checks easily that kernel {t, s) , introduced above, verifies the following property: {t, t+T)dt 
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{hw ★/i^)(t), the autocorrelation of kernel h^^. As a result, we can rewrite A. 7 in the form : 

(Var(/|/)); = J2 /e^™ * '^»)(^) ( / ^hltiur) ^''^^' ^ ^ ''^'^^^ '^^ 

= a'a^ / {h^ -k h^){T)j-{T)dT. (A.9) 

In the first line, the function of t defined by the fraction is positive and has an integral of 1, so it operates 
as a temporal averaging on ^ij(t,t + r). The resulting average over t, noted 'yijir) in the second line, is 
thus a form of cross- correlogram between neurons i and j, measuring the average covariance between the 
spikes from i and j separated by a time lag r. 

Because is a low-pass kernel with scale w, its autocorrelation function typically has support on 
[— w, w], and verifie^ {h^ ★ ^t«)(0) = . On the other hand, Jijir) typically has support on some 
interval [— t^,t^], where is the typical time scale of noise correlations in the population. As a result, 



as soon as w gets bigger than r^, the integral in (A.9) starts behaving like w \ an d the SNR of / scales 
as w. A similar analysis can be performed on the additional term V^^™"^ (eq. A.7|. 



A. 2. 2 Non-deterministic tfi 

What are the main departures from the main text when function g{t) has an extended temporal support? 
From eq. A.6||A.8 it is clear that the general form of the characteristic equations still holds: 



1 = b^a, 
Z-^ = a^Ca, 
7r(t) = r(t)a, 

but with more general definitions of b{w^ g), C{w, g) and r(f|w, g). First, an additional covariance matrix 
■ytemp y^g^y contribute to C, if neural activities are not stationary in time. Indeed, if tn varies from trial 
to trial, any variation of firing rates in time creates an additional source of variability in /. 



Second, through eq. A. 8 g{s) acts a weighting factor over the PCV curves that would be obtained for 
each tft;. T{t\g) = g{s)T{t\tii = s)ds. This leads to the spreading of PCV curves sketched in Fig. [Tj;. 

These two features lead to lose one specific property of the deterministic case. When the "natural" 
temporal averaging of PCV signals was considered, that is 7f = hwitu — t)iT{t)dt, the integrated PCV 
equation yielded 7f = Ca, because F = C. In the general case, the "natural" temporal averaging is 
7f = Jf jg * hni){t)T^{t)dt, and one checks easily that F 7^ C. Thus, with general g{t), the sensitivity 



(eq. |A.7|) and PCV (eq. A. 8 1 equations become more dissociated. 



In these conditions, it is unclear whether the statistical approach introduced in the main text could 
be extended, to also recover a non-deterministic extraction function g{t). The main concern is that the 



temporal evolution of PCV signals is only determined by the aggregate function {g*hyj)(t) (eq. A. 
which cannot be used to disentangle g{t) and w separately. However, general considerations suggest that 
the method could still work in that case. Indeed, the respective effects of g{t) and w on the covariance 
structures used in eq. |A.7||A.8| can roughly be thought of as a scaling: 

T^eiw,g)C, (A.IO) 

because the overall "shape" of covariance between neurons (as opposed to its "strength" ) does not depend 
much on the precise temporal integration used to compute their activity. Actually, under the specific 



^Assuming proper scaling for shape function h: h(u)du = h{u)'^du = 1. 
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assumption that "fij{t, s) — ^ijQQt — s|) (stationary activities with miiform temporal correlations), rela- 
tionship (eq. A. 10 1 can be shown to be exact, with 



expressed in terms of Fourier transforms. As a result, the mean PCV curve W{t) (eq. 15 16) is predicted 
to scale as e{w,g). So, while matching the temporal support of W{t) and W*{t) constrains the value of 
(g -k hjj^){t), matching their overall power constrains e{w,g), and we can hope to disentangle the values 
of g{t) and w separately. In practice though, this would require the fitting of at least one additional 
temporal parameter ; typically, the standard deviation of from trial to trial. 



B Singular value analysis — summary 

We summarize here the main results of a theoretical analysis to understand the evolution of SNR and 
PCV signals achieved by readout ensembles of growing size K . Detailed mathematical derivations are 
available in Supplementary Section S2. For simplicity we focus only on time-integrated neural activities 
s7 := hw{u)si{tii — u)du, assuming a fixed choice of {w,tji). We consider random readout ensembles 
JC in the population, and two resulting indicators. First, we consider the sensitivity Y{IC), linked to 
SNR Z by relationship Y — Z{1 + Z)^^ . This is the natural description of sensitivity in the framework 



below. It is obtained like Z in the main text (eq. 14) but replacing the noise covariance matrix C by 
the total covariance matrix A = C -I- {f'^)bb^ . Second, we consider the mean PCV in the population 



W{]C), obtained as the "natural" temporal integration of signal W{t\IC) from the main text (eq. 16): 
W :— J h^^{u)W{tii — u)du. Since W{t) is mostly positive, W roughly corresponds to the overall power 
in W{t). 



SVD reformulation of neural activity. The analysis relies on the singular value decomposition 
(SVD) of population activity into m — \ . . . M orthogonal modes: 

M 
m— 1 

where the lower index i = 1 . . . A^tot indicates neurons in the population, and the upper index indicates 
all possible stimuli / and random realizations lu of network activity. Each mode m is defined by its 
power Am > 0, its distribution vector (over neurons) u™, and its appearance variable Vm which takes a 
different random value on every trial. By construction, the various modes are orthogonal across neurons 
((u'")^u" — 5™"), and linearly independent across trials {Gov f^{vm,Vn) = S^n), so they typically 
correspond to distinct "patterns of activity" in the population. The power Am describes the overall 
impact of mode m on population activity. We assume Ai > • • • > Aj\/, so we progressively include modes 
with lower power — either because they involve only a small fraction of neurons, either because they appear 
only on rare trials. The number of modes M is the intrinsic dimensionality of the neural population's 
activity. In real populations we expect M « Ntot, because neural activities are largely correlated. 

The SVD is best viewed as a change of variables reexpressing neural activities {si}i=i...iVt„t in terms 
of mode appearance variables {vm}m.=i...M- Just like individual neurons, each mode m can be associated 
to a sensitivity to stimulus 77™, which describes the proportion of the mode's power Am due to variations 
of the signal (/), as opposed to variations of the noise (w). Since modes are linearly independent, the full 
population's sensitivity corresponds to the sum of individual mode sensitivities: Y{oo) = X^m'/m- 
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Sensitivity and PCV from finite neural ensembles. We now want to estimate the amount of 
stimulus sensitivity Y{IC) tliat can be extrated, not from the full population, but from neural subensembles 
of size /C. The SVD provides a natural reinterpretation of this problem in terms of activity modes: each 
ensemble /C "reveals" only a fraction of the underlying modes. The pivotal object to perform this 
reinterpretation is our so-called data matrix: 

f m=l...M 

an M X K matrix describing the activity of neural ensemble K. in the space of modes. In the original 
problem formulation, the K x K matrix D^D describes the covariance of neural activity in ensemble 
/C, and we want to estimate the resulting sensitivity. In the dual formulation, the M x M matrix DD^ 
describes a covariance structure between modes, but estimated only from the sample neurons in K.. The 
problem now lives in a space of fixed dimensionality M, and can be related to classical problems of 
estimating covariance structures from a finite number of samples — in our case, the neurons. 

Applying this dual approach, we find that Y{JC) and W{JC) depend on readout ensemble JC only 
through an Af X M matrix A.jc, the (rank K) orthogonal projector on the span of vectors {dijig^; in 
mode space: 

W(IC) = -B^ + {NtotY{JC))-^v'^A^A,cV, 

where :— Ei(6?) is the average square tuning in the population. Furthermore, the average projector 
Aye across ensembles of size K, that is E^A, is approximately diagonal in mode space. Noting {e^} for 
its diagonal, we thus obtain the approximations: 

M 

EkY 0, e]}r,l, (B.l) 
EkW -B' + jVrot^ ^-r/ ^I","' ^ (B-2) 

Z^m=l ^kVti 



where < < 1 is the average "proportion" of mode m revealed by K random neurons. As modes with 
larger power A™ tend to be revealed first, a rough but useful image is to consider that ~ lm<K — only 
the K first modes are revealed by ensembles of K neurons. 

Thus, sensitivity E^i^ grows with K as mode sensitivities rjm are progressively revealed. Saturation 
occurs when all nonzero rj^ are revealed, in which case EkY = Y{oo). Conversely, the mean PCV Fij^W 



decreases with K. Indeed, the fraction in eq. B.2 can be viewed as an average power {X'^)m.K, where 
each mode m contributes with a weight e^rj^. As progressively reveals modes with lower power Am, 
this average power is expected to decrease with K. Again, saturation occurs when all nonzero rjm are 
revealed, and then EkW = Z{qo)~^ B'^ , the predicted value for choice signals in case of optimal readout 



from the full population (Haefner et al. 20131. 



Extrapolation to large K. What do these results tell us about possible extrapolations to ensemble 
sizes K larger than the maximum number of neurons simultaneously recorded by the experimenter? 
Essentially, that such extrapolations always require further assumptions about the structure of activity 
in the population. 

Indeed, one can imagine scenarios in which the most sensitive modes (those with highest ry^) are 
associated to relatively low powers A^ and thus, appear only at large K. This could be the case, for 
example, if a very local circuit of neurons carries a lot of information about the stimulus, independently 
from the rest of the population. Because it involves few neurons, the corresponding mode of activity 
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will have a low power \f^, and will require very large ensembles K. to be detected — simply because the 
corresponding neurons are not recorded in smaller ensembles. A similar discussion can be found in 
( 2013) . Another example is the encoding network theoretically proposed by|Boerlin and 



Haefner et al. 



Deneve (20111, where each neuron spikes only if its information is not already encoded in the activity 
of the remaining neurons. This results in the appearance of a few, global modes of activit}!^ which are 
specifically designed to have a very large SNR, meaning high r]m and low A^. In this case, any estimation 
of sensitivity from a subpopulation JC will consistently be smaller than the full population's sensitivity. 

To summarize, extrapolation can only be performed under additional assumptions about the overall 
link between r]m and Am — or equivalently, about the relationship between 'signal' and 'noise' contributions 



to population activity (see also discussions in Wohrer et al. 2012 Haefner et al. 2013). The extent to 



which such assumptions are justified will depend on each specific context. 
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SI Encoding network 

We detail here the architecture of the artificial encoding network used to test our method (summarized 
in section 3.1 from the main text). This ad-hoc network was designed to display some classic features 
of sensory cortical neurons involved in perceptual decision-making tasks (e.g, V2, MT, SI, S2...). To 
reproduce the diversity of response naturally observed at the population level ( Wohrer et al.[ 2012) 



neurons in our network have broadly distributed firing rates, and some diversity in their temporal response 
profiles. We also wished to reproduce the continuum of tuning to stimulus observed in real populations, 
where some neurons have positive tuning to stimulus (rate increase when / increases) , and other neurons 
have negative tuning. Finally, we wished to reproduce realistic strengths of noise correlations between 
neurons in the population (Figure 3b from the main text), and insure that the tunings of each pair of 
neurons (their "signal" correlation) be only slightly predictive of their noise correlation — another feature 



often observed in real sensory populations (Wohrer et al. 2012) 



The network consists of two distinct layers of spiking neurons, of which only the second layer (encoding 
layer) is "visible" to the experimenter. The first layer (LI) consists of 100 = 2 x 50 independent Poisson 
neurons, whose firing intensity / constitutes the stimulus encoded by the second layer. On each trial, / 
takes one of three possible values 25, 30 and 35 Hz. All neurons are equivalent, but segregated in two 
distinct populations according to their projections on the second layer. The Poisson firing constitutes 
the only source of randomness in the network from trial to trial. 

The second layer (L2) consists of 500 leaky integrate-and-fire (LIF) neurons, some of which receive 
input from LI, and who are all coupled through a sparse, balanced connectivity. The generic equation 
for these neurons writes 



The neuron emits a spike at each time when V^'^^ reaches threshold V*'^^, after what the neuron's 
potential is reinitialized at resting value y^^"*. AH neurons share the same membrane time constant 
T = 20 msec, threshold = —50 mV, and resting potential V^'^'^* = —60 mV. Upper index s denotes 
one of three possible subtypes of neurons in L2: Positively-biased neurons (s = p, 100 neurons), negatively- 
biased neurons (s = n, 100 neurons) and unbiased neurons (s — u, 300 neurons). 

Positively-biased neurons receive sparse excitatory connections from 50 neurons in LI {W^l'"^ > 0), 
whereas negatively-biased neurons receive sparse inhibitory connections from the 50 other neurons in LI 
^^ji.") Qy Unbiased neurons receive no direct input from LI {Wjj'^^ = 0). As these asymmetries 
create biases in the total synaptic inputs to each type of cell, the intrinsic currents and I*^"-' 

also vary depending on neuron subtype, to insure homogeneous firing properties inside the three popu- 
lations (see Table ll|. Finally, all L2 neurons are connected through a single matrix W*^^-' of recurrent 
connections — independently of their subtype. All connection matrices W^^'*^ and W^^-' are sparse with 
(Erdos-Renyi) connection probability p = 0.2. Non-zero connection strengths are picked uniformly in an 
interval [wmin, Wmax]; which depends on the connection considered: see Table [l] Note that L2 recurrent 
connections can be both excitatory and inhibitory, a departure from biology which allows for an easier 
implementation. 

Finally, the recurrent connections in L2 are associated to synaptic delays: for each pair (i, k) of con- 
nected L2 neurons, the random delay Aki is drawn uniformly between and 5 msec. This substantially 
increases the diversity of neural responses in the population, particularly at the level of JPSTHs (Figure 
3e from the main text) — this is interesting because our method is specifically designed to analyse generic. 



33 



Subtype ^t'min'' ^«max Wmin ^max 



Pos. biased (p) 2 -2 2 

Neg. biased (n) 14 -3 0-2 2 

Unbiased (u) 5 -2 2 



Table 1. Connectivity parameters in the three subtypes of L2 neurons. All values are expressed in 
millivolts. 



heterogeneous population activities. 



We implemented and simulated the network using Brian, a spiking neural network simulator in Python 



(Goodman and Brette 2008). Our simulation consisted of many successive epochs of 500 msec with all 
possible successions of the three stimulus values / (as in Figure la from the main text). Since the input 
Poisson neurons were always firing close to 30 Hz, there was no strong transient at stimulus onset as is 
often observed in real sensory neurons. In our case, the change of activity between two successive stimuli 
was always only differential, and rather weak (see Figure 3c from the main text). 
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S2 Singular value analysis 

We detail here our matheniatical analysis to understand the evolution of SNR and PCV estimates in 
growing populations of size K, as a function of the underlying structure of the full population. These 
results expand the condensed presentation proposed in appendix B of the main text. 

1 Notations 

1.1 Activity across neurons, stimuli and trials 

For simplicity, we consider a timeless version of neural activities, although the whole analysis could be 
extended to include time as well. In our readout framework, this means that we fix some candidate 
temporal integration parameters {w,tfi), and consider the resulting neural activities 5^, constructed from 
the temporal integration of each neuron i's spike^ 




Since our main results have been presented in the case of linear tuning to stimuli, we stick to this 
hypothesis. This implies that all signal/noise properties can be understood by considering only two 
stimuli (as the difference in response between these two stimuli totally defines the linear tuning of each 
neuron). We thus note / = {0, 1} the two possible stimulus values which can be input to the network. 

Finally, we may want to consider the possibility of imprecise neural measurements, due to recording 
from only a finite number of trials (although it is not the main concern of this note). We thus denote 
Lj E fl the set of all possible different realizations of network activity. In theory, D is an infinite set of 
possible events. However, we will formally assume it to be finite, with (huge) cardinality — so on a 
given trial, each possible network realization uj has a probability of coming out. 

We thus summarize all possible network realizations through the array 5/" , where i — 1 . . . N denotes 
all neurons in the populatioij^ / = 0, 1 denotes stimulus value, and w = 1 . . .fl denotes all possible 
realizations. The notation /w, somewhat abusive, applies the same indexing uj for possible realizations 
in both stimulus conditions / = and / = 1 — which can only be done if both stimulus conditions allow 
the same number D, of possible network realizations. However, given the formal nature of ensemble (D, 
this notation abuse appears harmless. 

As we start doing statistics across neurons and trials, we will need to compute expectancies (i.e., 
means) and covariance structures across various dimensions. In all cases, we apply the generic notation 
^a{-^a,p....) to denote the empirical mean of quantity Xa,i3,... when a is varied over ensemble A {(3, . . . 
being any other parameters that are held fixed). When ensemble A is unambiguous, meaning that 
it includes all possible values for a, we will omit it. Finally, second order variances and covariance 
structures will generically be computed as Cov^{Xa, Ya) — E^(AqYq) — E^(Xq)E^(Yq). 

As a first application of these notations, remember that the whole sensitivity analysis derived in 
the main text deals only with variations: the "signal" measures variations of activity with a change in 
stimulus /, while the "noise" measures variations of activity across trials uj. Thus, the overall mean level 
of activity for each neuron i, that is Ef^{S('^), plays no role in the analysis: it always disappears from the 
computations of tuning and noise covariance structure. To clarify further notations, we can thus offset 
all neural signals and assume that Ef^{S{'^) = 0, for every neuron i in the population. 

1.2 Modes of activity in the neural population 

The key argument of this note relies on interpreting 5*/" as a very large N x (217) matrix, and considering 
its singular value decomposition (SVD). The (compact) SVD is a standard decomposition which can be 
applied to any rectangular matrix S. It writes S = UAV^, where A is an M x ill diagonal matrix with 

^Si is noted 37 in the main text. 
^A'^ is noted A^tot in the main text. 
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strictly positive entries Am (the singular values) , U is an x M matrix of orthogonal columns (meaning 
U^U = Mm), and V is an 17 X M matrix of orthogonal columns (meaning V^V = Idjv/)- 
With our current definition of neural activity S, the SVD decomposition writes 

M 

st = E ^""r«l", (1) 

m— 1 

where the orthogonality of U writes: 

N 

V(m,n), E "r< = '5™"' (2) 

and the orthogonality of V similarly writes 'Ylifui'^m'^L^) — ^mn- In the case of V, our above convention 
that E/t^(S'/") ~ for all neurons i actually imposes that F,f^{vlf) — for all modes m. We thus 
reinterpret the orthogonality of V as a linear independence between the different random variables u,„: 

Vm,E/^(z;^-)=0, (3) 
V(m,n),Cov/„(u4",w;{'") = 5™„. (4) 

Note that we reinterpret the sum over trials (/, lo) as an expectancy (thus rescaling A„j by ensemble size 
2f]). This allows to emphasize the statistical interpretation of the SVD decomposition in this case. 

Each triplet {Xm,u"^ ,Vm) defines one particular mode of activity in the population. We call the 
power of the mode, u™ (viewed as an iV-dimensional vector) its distribution vector, and Vm (viewed as 
a scalar random variable) its appearance variable. The appearance variable — which takes a different 
value on every repetition of the experiment — describes the probability of appearance of each mode 
TO across stimuli and trials. Through eq. |4j each mode to verifies Eyij((w4")^) = 1, meaning that all 
modes have the same overall "expected appearance" across trials. 

Similarly, eq.p^implies that J2t (("D^) = 1. 

so u™ describes the normalized distribution of the mode 
across the neuralpopulation. Some modes m may correspond to a rather homogeneous distribution of 
(m™)^ across the population, meaning that the mode is very distributed, whereas other modes may have 
power concentrated only over a small subensemble of neurons. These are the modes corresponding to 
local patterns of activity which only impact a small fraction of the total neural population. 

Finally, the power Am describes the overall impact of mode m on population activity. Indeed, although 
distribution vectors u™ and appearance variables Vm display the same normalization across modes, this 
does not mean that all modes are equivalent. Instead, only those modes with the largest values Am will 
truly impact the population, in the form of measurable changes of activity across neurons and trials. 
Conversely, modes with small values A„i will scarcely impact population activity, either because they 
involve only a small fraction of neurons, either because they are distributed but very weak. 

The overall number of modes M is equal to the rank of matrix S('^ , so it is by construction smaller 
or equal to the population size N (which we assume to be smaller than the huge number Q of possible 
realizations across trials). M defines the typical dimension of the manifold in which all neural activity 
occurs. In real neural populations, although N is itself a very large number, there are reasons to believe 
that M is sensibly smaller, due to correlated activity between neurons. 

1.3 Statistics of activity 

We now reinterpret classical measures of neural activity in the framework defined above. At this point, 
we need to carefully specify the nature of the ensembles truly available for measures: a finite subset JC of 
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neurons from the population, and a finite ensemble £ of trials (each element of £ providing one realization 
for stimulus f — and one realization for / = 1). 

For every neuron i G IC, recorded over trials u Q £, we compute the tuning to stimulus as 

■■=l{K{Sln-EiiSr)), (5) 

that is, the difference between the experimental mean firing rates in stimulus conditions / = 1 and / = 0. 
Similarly, we compute the noise covariance term between any two neurons i and j as: 

Cf, ■■= \ (Cov5(^^, S^-) + Cov^(^^, , (6) 

that is, the stimulus-averaged noise covariance between i and j. Finally, we introduce the total covariance 
matrix Af^ summing up all sources of variance across the population: 

4: = Cov^„(s/",5f) 

= Cf,.+&f6f. (7) 

The last line provides the classic decomposition of the total covariance matrix into noise covariance matrix 
and signal covariance matrix (6^)(b^)^ — which has rank 1 under our assumption of linear tuning to 
stimulus. 

When ensemble £ is equal to the full space of possible realizations, the above formulas define the 
"true" measures of covariance, as would be obtained given a sufficient amount of trials. In the sequel, we 
refer to these true, error-free values, by removing the mention to £. That is: bi, Cij and Aij. 

The SVD decomposition (eq. [T]) is best interpreted as a change of variables reexpressing neural ac- 
tivities {Si\i^i,,,N in terms of mode appearance variables {wm}m=i...M- As a result, we can define the 
respective equivalents of tuning, noise covariance and total covariance in the space of activity modes. In- 
deed, although mode appearance variables Vm are never directly observed, they still have some statistics 
across trials. We thus define: 

$f„„: = Covf„(z;/r,i;^"), 

which define tuning and total covariance in mode space (noise covariance being implicitly defined as 
-— (?7^)(J7^)^). Again, we will denote the true tuning and covariance by removing the mention to £: 
true tuning 77 and true total covariance Importantly, the normalization of variables Vm in eq.|4] implies 
that $ = Idjif. 

Mode powers and distribution vectors u™ then allow to relate the statistics at the levels of neurons 
and modes. Injecting the SVD formula (eq. [TJ into equations [s] and [7| yields (in matricial form): 

= UAr/^, (8) 
= UA*^AU^. (9) 

In particular, when true noiseless measures are considered so that $ = Idj\/, we see that U and A directly 
provide the standard (nonzero) eigenvalue decomposition of the total covariance matrix A, as 

A = UA^U^. 

•^Vector b from this appendix corresponds to Ufb from the main text, where = {f^) j — {f)^f gives typical variations 
of input stimulus. 
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2 SNR and PCV predictions 

We now wish to understand which factors determine the evolution of curve Z{K), the average SNR 
embedded in neural subensembles JC of cardinal K. We can also study the evolution of percept covariance 
(PCV) signals, in the same framework. 

In the main text, we compute SNR and PCV for ensemble K, through Fisher's linear discriminant 
(eq. 13-16). One sees easily that these definitions, involving tuning b and noise covariance matric C, are 
equivalently expressed in terms of tuning b and total covariance matrix A: 

- (blA^'b^)-' A^'b^, (10) 
Y{IC) = blA^'b^. (11) 

We cah Y the signal-to-total ratio (STR), which relates directly to SNR Z by the formula Y = Z/{l + Z). 
Y always takes values between [Z = 0) and \ [Z — oo), it thus avoids singularities which may occur in 
the direct Z formulation. If matrix A^; is rank-deficient, we consider its (Moore-Penrose) pseudoinverse 
without loss of generality (see further down). 



2.1 Total STR in the population 

The SVD decomposition (eq. [T]) reexpresses neural activity in the space of modes m — 1 . . . M. When 
the full neural population is considered, the full matrix A and vector b are involved in eq. 11 Using the 
SVD formulations (eq. |8][9| we thus find: 



Y{oo) 



b^A-'b 



AU^ (UA^U^ ) - 1 U Ar/ 



M 
m— 1 



(12) 



Thus, each mode contributes to total sensitivity by the strength of its intrinsic sensitivity ri„i- 

This computation can also be derived assuming a finite number of experimental trials £ . In this case 
however, we must introduce the experimental sensitivity of each mode, defined as 



(13) 



where (#^)~2 is the unique (Moore Penrose) pseudo-inverse of the symmetric, non- negative square 
root matrix of Actually, any other choice of matrix square root could also be used, because by 
construction > (''7^)(^^)^5 in the sense of symmetric positive matrices. This insures that rj^ is 
orthogonal to Ker($^), and thus the unicity of Cf' as defined in eq. 
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The computation of Y{oo,£) then goes along the same lines as previously: 



Y{oo,E) = {b^y{A^)-^b^ 

= (J7^)^AU^(UA**AU' 

ll/"£||2 



^UAry* 



(14) 



Generally, one expects Y{oo,£) > F(cxi), because the estimated is flatter than its true value of 
$ = IdM, with eigenvalues closer to 0. This is a classic result when estimating SNR (or STR) from an 
insufficient number of trials, a typical example of overfitting. As mentionned in the main text, there is no 
miracle cure to this problem, which should be addressed through appropriate methods of regularization 



and cross-validation! Hastie et al. , 2009j ) 
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2.2 STR for finite neural ensembles 

We now turn to the sensitivity embedded in finite subensembles K. from the population. The definitions 



of A.IC and b]c used in eq. 11 amount to a projection from the full neural space to subensemble K,: 

bfz = P/cb, 

where Pyc is the K x N orthogonal projector on recorded neurons IC. Through the SVD decomposition 
in eq. [8]|9j we reexpress these quantities as: 

= Blv (15) 
= dJd^, (16) 



where 



:= AU^Pj, (17) 



is our so-called data matrix, an M x K matrix with elements d™ :— \mU^ . It represents the experimental 
data from neurons K., expressed in mode space. 

To compute the resulting sensitivity predicted by eq. |11[ we note that through eq. |16[ matrix Ay^ has 
the same eigenvalues as its dual Gram matrix D/cDJ, an M x M matrix with rank d := min(i4r, M) — 
generally equal to K. We introduce the (compact) SVD decomposition of this matrix: 

DD^ =XT2X^, 



where > is a d X d diagonal matrix, and X is an M x d matrix of orthogonal columns (for clarity 
we remove the unambiguous references to ensemble /C). It is shown easily that this decomposition also 
provides the SVD for Aytc, in the form: 

Aye = (D^XT~^)T^(D^XT-^)^, 

where (D^XT"^) is a if X d matrix of orthogonal columns, as required in the SVD decomposition. Thus, 
the (pseudo-) inverse of A.^, writes: 

A^i = (D^XT"i)T"2(j)T-j^rp-i^T_ 
This allows to finally compute the experimental STR, from eq. [TSpB) 

Y{K.) = blA^'bfc 

= r/^D(D^XT"^)T"^(D^XT"^)^D^r7 
= T7^XT^X^XT"''X^XT^X^J7 
= r,T(XX^)r7, 

making use of the fact that X^X = Id^;. Intriguingly matrix T, which describes the eigenvalues of Ay^, 
disappears from the final equation. Only matrix X, corresponding to the eigenvectors of DD^, remains 
in the equations. We note Aye ■— Xy^Xj, which is nothing but the M x M orthogonal projector on 
Im(Dyc). This leads to the final expression: 

Y{JC)^V^A,cV- (18) 
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Neuron ensemble K, only appears through A^^. In particular, as soon as K is larger than the number 
of modes M, necessarily A^^ = ^^M^ and Y{JC) = 1^(00): all modes are available experimentally, and 
sensitivity estimates saturate to their maximum value, independently of ensemble K,. 

The whole analysis can be performed similarly assuming a finite number of measurement trials £. 
The only difference is a modification in data matrix D, to take into account the biases in mode space 
induced by an insufficient number of trials: D|. := ($^)3 AU^Pj, using the same square root of as 
in eg. |13[ Similar computations lead to the final result: 

r(/c,£) = (c^)^A|c^ (19) 



which depends on experimental mode sensitivities (eq. 13 1 and on Ay^;, the orthogonal projector on 
Im(Df;), of dimension d ~ min(i4r, Af, E). 

2.3 Percept covariance for finite readout ensembles 

Similarly to the approach above, we can express PCV signals in mode space. Since we do not model time, 
we only have access to the temporal average ttI := /„>o (^-f* ~ u)hw{u)du, where Tri{t) is the full PCV 
curve from the main text. From eq. 9 of the main text, it falls easily that W = Ca. Using the optimal a 



for readout ensemble /C (eq. 10 with a = Pyc^K: since a has support on /C), we thus predict: 

7f(/C)=r(/C)-iCPjA^i6^, 

which provides the value of Wi for every neuron i in the population (not only in ensemble IC). Making 
use of the same SVD decompositions as above, and of relationship C = A — bb^ , we finally find: 

7f(/C)+5 = r(/C)-i UAAycr/, (20) 

which expresses 7f(/C) as a linear combination of mode distribution vectors u™. As JC tends to the full 
population, A;c tends to Idjv/ and we get 7f(oo) — Y^^b— b — Z^^b, the prediction for choice signals in 



case of optimal readout (Haefner et al. 2013). 

In turn, the population average for PCV is W{IC) := Ei(6j7i7(/C)^ Using eq. 20 and the general fact 
that F,i{xiyi) = N~^K^y, we obtain 

WilC)+E,ibl) = N-^b'^ {n{IC) + b) 

= {NY{JC))-'q'^A^A,cV, (21) 

because b = TJAr] (eq.jsj) and U^U = Id. This reveals the interest of multiplying If^ by the corresponding 
tuning bi (see discussion in main text): it allows to get rid of the unknown distribution vectors U, and 
instead produce a quantity W which is directly related to the underlying modes' powers A and sensitivities 
V- 



As it appears in eq. 21 we note := Ei(&|) the average square tuning in the population. With 

hows that 

M 

B^^iV-VA^rj^AT-i^^. (22) 



similar arguments as above, one shows that 

M 



rn—l 



^which corresponds to the temporal integral J^^g W{tR — u\IC)hn]{u)du for the PCV curve Vl^(t|/C) defined in the main 
text (eq. 16). 
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2.4 Behavior with K 

We are now better armed to understand how sensitivity and PCV predictions vary as a function of the 
readout ensemble JC. We are mostly interested in averages of these quantities over randomly chosen 



ensembles K. of size K ; we thus use the generic notation Ex(a;) := E(x(/C)|Card(/C) — K). From eq. 18 
we find: Ek>" = J7^(EkA)?7. 

To understand the properties of the (M x M) matrix E^ A, we view the [M x K) data matrix D^^ 



(eq. 17 1 as a collection of K random vectors dj in mode space, viewing neuron identities i as the random 
variable. Thus, A^c is the orthogonal projector on the linear span of the K sample vectors {d^jig^c- As a 
projector, its trace is equal to its rank, so we have Tr(EifA) = K. Furthermore, since K + 1 samples span 
on average more space than K samples, we are insured that E^+iA }z E^A, in the sense of positive 
definite matrices. Finally, intuition and numerical simulations suggest that E/< A is almost diagonal. 
Indeed, as the various modes are linearly independent (eq. there is no linear interplay between the 
different dimensions of d across samples i: E,;((i™d") — N~^\^S™"', or equivalently 



Assuming a form of independence between X and T, it is reasonable to suppose that Ex(X;cXj) = E^^ A 
is close to diagonal as welj^ 

Assuming that E^A is diagonal, we note its diagonal terms {e^}m=i...A/ and consider the resulting 



approximations of sensitivity (eq. 18 1 and mean PCV (eq. 21 ) 

M 



EkY ^ J2 'kvL (23) 

m— 1 

M 

(y{W + B'))^ ^K^l^l- (24) 



The properties of Ea'A imply that e^i = K (trace property), and e^^^ > (growth property). As 
K augments, {e^} progressively "fiUs-in" the space of modes, starting from the modes with larger power 
\m- Indeed, the larger A^, the more often mode m appears in samples {d^}. As a useful image, we may 
think of the (very) rough approximation ~ l,„<if : only the K first modes are revealed by a sample 
of K neurons. Naturally this is only a gross approximation, as can be seen easily by considering a single 
sample i {K = 1). From intuition and simulation, the true shape of {e^} (at fixed K) is a "smoothed" 
version of lm<K, and the degree of smoothing depends on the power law governing the spectrum {Am}. 

With this image in mind, eq.[23]shows that the growth of sensitivity with K is linked to the progressive 
summation of mode sensitivities rj^, starting from modes with highest power X^: 

EkY /• r(oo), 

K 

with a saturation as soon as all nonzero mode sensitivities rj„i are revealed. Conversely , for PCV signals, 
we can make the rough assumption that Ek{WY) ~ EA'(VF)Ex(y), in which case eq. 



24 



rewrites 



where each mode m contributes with a weight e^?]^, and E^Y = ^^Vm provides the normalization 
factor. Thus, {Xf^)m,K reflects the average power of modes with the higher sensitivity, that are already 



rigorous proof might be accessible assuming a normal distribution for random vector d. In the general case, small 
deviations from diagonality can probably occur. 
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revealed with K neurons. As K grows, {e^} progressively "fills- in" modes in the order of decreasing A„ 
Thus we expect (A^),„^if to decrease with K. Finally, as soon a.s K > M, we have {e^} = {1}, and 



A2 

N I m.oo 



= N 



-lZ_mj=l ^ 

E 



2 2 

■m I'm 
M 2^ 



Y{oo)' 



reckognizing the expressions for B (eq. 22) and y(oo) (eq. 12). Since Y — I = Z , the predicted 



evolution of mean PCV signal with K follows: 



EkW 



B^ 



K 



Z{oo) 



> 0. 



W is predicted to be positive, to decrease with increasing size K, and to saturate at its minimum value 
once all significant mode sensitivities r]m have been revealed — which is also the moment when sensitivity Y 



saturates at its maximum value (eq. 23), and corresponds to an optimal readout from the full population. 



The implications of these results in terms of extrapolation to large K are discussed in the main text. 



